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Abstract 

The  prediction  and  forecasting  of  violent  conflict,  is  of  vital  importance  to 
formulate  coherent  national  strategies  effecting  regional  and  worldwide  stability  and 
security.  Using  open  source  data,  this  research  formulates  and  constructs  a  suite  of 
statistical  models  that  predict  future  transitions  into  and  out  of  violent  conflict  and 
forecasts  the  regional  and  global  incidences  of  violent  conflict  over  a  ten-year  time 
horizon.  A  total  of  thirty  predictor  variables  are  tested  and  evaluated  for  inclusion  in 
twelve  conditional  logistic  regression  models,  which  calculate  the  probability  that  a 
nation  will  transition  from  its  current  conflict  state,  either  “In  Conflict”  or  “Not  in 
Conflict”,  to  a  new  state  in  the  following  year.  These  probabilities  are  then  used  to 
construct  a  series  of  nation-specific  Markov  chain  models  that  forecast  violent  conflict,  as 
well  as  yield  insights  into  regional  conflict  trends  out  to  year  2024  and  beyond.  The 
logistic  regression  models  proposed  in  this  study  achieve  training  dataset  accuracies  of 
88.76%,  and  validation  dataset  accuracies  of  84.67%.  Additionally,  the  Markov  models 
achieve  three  year  forecast  accuracies  of  85.16%  during  model  validation.  Given  the 
current  state  of  included  predictor  variables,  this  study  predicts  that  global  violent 
conflict  rates  remain  constant  through  year  2024,  but  are  projected  to  increase  beyond 
that  timeframe  with  95  of  the  182  considered  nations  projected  to  be  in  a  state  of  violent 
conflict  from  the  current  84  nations  in  conflict. 

KEYWORDS:  Conflict  Transitions,  Logistic  Regression,  Markov  Models 
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A  LOGISTIC  REGRESSION  AND  MARKOV  MODEL  FOR  THE  PREDICTION 
OF  NATION-STATE  VIOLENT  CONFLICTS  AND  TRANSITIONS 


I.  Introduction 

"It  makes  no  difference  what  men  think  of  war,  said  the  judge.  War  endures.  As  well  ask 
men  what  they  think  of  stone.  War  was  always  here.  Before  man  was,  war  waited  for 
him.  The  ultimate  trade  awaiting  its  ultimate  practitioner.  ” 

Cormac  McCarthy,  Blood  Meridian 


1.1  General  Issue 

Violent  conflict  between  competing  groups  has  been  a  pervasive  and  driving  force 
for  all  of  human  history.  It  has  evolved  from  small  skirmishes  between  unarmed  groups, 
wielding  rudimentary  weapons,  to  industrialized  global  conflagrations.  Global 
incidences  of  violent  conflict  are  at  historically  high  levels,  with  223  individual  ongoing 
violent  conflicts  occurring  throughout  the  globe,  as  shown  in  Figure  1  (Heidelberg 
Institute  for  International  Conflict  Research,  2014).  While  some  of  these  conflicts  are 
new,  many  have  been  ongoing  for  a  decade  or  more,  with  no  potential  resolution  in  sight. 
Many  recent  studies  have  focused,  with  much  success,  on  identifying  the  factors  relevant 
for  the  accurate  prediction  armed  conflict  in  nations.  However,  these  studies  have 
mainly  focused  on  predicting  conflict  in  the  following  year  or  two.  While  there  is  much 
to  be  gained  from  these  analyses,  a  more  operationally  relevant  question  is:  where  and 
when  will  conflict  transitions  occur?  A  conflict  transition  is  an  event  in  which  a  nation 
transitions  into  or  out  of  a  state  of  violent  conflict. 

Conflict  transitions  by  their  very  definition  are  rare  events  and.  while  some 
conflicts,  are  brought  about  by  the  unforeseen  “Black  Swan”  events,  many  times  there  are 
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overt  but  subtle  indicators  that  a  conflict  is  becoming  more  likely.  Moreover,  research  in 
support  of  this  study  has  identified  a  trend  that,  once  a  nation  enters  a  certain  conflict 
state,  it  tends  to  remain  in  such  a  state  until  some  new  event  or  events  occur  to  disrupt 
this  “conflict  inertia”.  To  answer  the  question  concerning  when  and  where  conflict 
transitions  will  occur,  this  study  develops  a  collection  of  conditional  logistic  regression 
and  Markov  chain  models  to  predict  when  and  where  these  conflict  transitions  are  likely 
to  occur  and  subsequently  forecast  global  conflict  incidences  using  open  source  data. 


INTENSITY 


Figure  1:  National  Level  Violent  Conflict  in  2014 

(Heidelberg  Institute  for  International  Conflict  Research,  2014) 


1.2  Problem  Statement 

Use  open  source  data  to  develop  statistical  models  that  predict  and  lend  insight 
concerning  when  and  where  the  world’s  nations  transition  into  or  out  of  violent  conflict. 
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1.3  Research  Objective  and  Focus 

The  objective  of  this  study  is  to  predict  future  worldwide  and  long-tenn  conflict 
trends,  national  conflict  transience  indices,  and  to  identify  the  exacerbating  and/or 
enabling  factors  that  lead  to  increased  or  decreased  probabilities  of  conflict  transitions. 
This  study  utilizes  Markov  modeling  methods  supported  by  conditional  logistic 
regression  models  to  predict  transitions  into  and  out  of  violent  conflict,  and  the 
subsequent  forecasting  of  global  incidences  of  violent  conflict.  This  study  analyzed 
global  conflict  and  its  contributing  factors  for  the  years  2004  through  2014. 

1.4  Research  Questions 

This  study  seeks  to  answer  the  five  following  research  questions  pertaining  to  the 
prediction  of  conflict  transitions  and  forecasting  of  global  conflict. 

Question  1 

How  accurately  can  statistical  models  predict  conflict  transitions  for  individual 
nations? 

Question  2 

What  factors  are  the  significant  predictors  of  conflict  transitions? 

Question  3 

How  is  the  number  of  global  conflicts  predicted  to  change  by  2024  and  beyond? 

Question  4 

What  nations  are  susceptible  to  conflict  transitions;  which  nations  appear 
invulnerable  to  conflict  transitions? 
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Question  5 


Which  nations,  currently  not  in  conflict,  are  identified  as  near-term  risks  for 


transitions  into  violent  conflict? 


1.5  Methodology 

This  study  compiles  and  formats  over  30  disparate  databases  into  a  single 
conditional  conflict  database  (CCD).  This  effort  is  required  for  the  development  of 
twelve  region  specific  conditional  logistic  regression  models,  one  set  for  nations 
classified  as  “in  conflict”  and  the  other  set  for  nations  classified  as  “not  in  conflict”. 
Using  these  conditional  logistic  regression  models,  we  develop  nation-specific  Markov 
models  to  forecast  conflict  status  transitions  and  future  conflict  trends  for  182  of  the 


world’s  nations.  The  complete  study  methodology  is  presented  in  Figure  2. 


Database 
Consolidation  & 
Development 


Model  Analysis 


Conditional  Logistic 
Regression  Model 
Development 

~w 

Construct  12 
Conditional  Regional 
Models 


Figure  2:  Study  Methodology 
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1.6  Study  Assumptions  and  Limitations 
Assumptions 

Four  underlying  assumptions  were  required  to  proceed  with  the  methodology  and 
analysis  in  this  study.  Similar  to  previous  conflict  prediction  studies,  this  research  first 
assumes  the  existence  of  statistical  and  trend  variables  that  are  accurate  and  viable 
predictors  of  violent  conflict.  Second,  this  study  assumes  that  any  variable  identified  as 
significant  within  the  model  will  remain  relevant  from  year  to  year,  and  for  the  duration 
of  the  conflict  forecasting  period.  Next,  this  study  assumes  that  the  six  geographic 
regions  utilized  for  the  development  of  the  conditional  logistic  regression  models  provide 
suitable  commonality  in  terms  of  economy,  geography,  ethnic,  and  religious 
demographics  to  facilitate  the  modeling  effort.  Finally,  to  support  the  forecasting  of 
global  conflict,  this  study  assumes  that  regional  factors  relevant  to  conflict  remain 
unchanged  throughout  the  forecasting  period. 

Limitations 

Data  availability  is  this  study’s  single  greatest  limitation,  mandating  a 
combination  of  data  lag  prediction  and  data  imputation  for  the  development  of  the 
conditional  logistic  regression  models.  Data  lag  prediction  refers  to  the  requirement  of 
using  data  sets  that  may  be  one-to-three  years  behind  the  dependent  variable,  a 
suboptimal  but  proven  method  for  the  prediction  of  violent  conflict  in  nations 
(Boekestein,  2015).  Missing  data  further  exacerbates  the  lag  in  the  data  sets,  and  it  must 
be  accounted  for  using  statistical  data  imputation  methods  available  in  commercial 
software.  In  addition  to  these  limitations,  this  study  requires  an  expanded  conflict  data 
set,  spanning  the  years  2004  through  2014,  in  order  to  capture  enough  instances  of 
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conflict  transitions  to  effectively  build  the  regional  logistic  regression  models.  While  an 
expanded  data  set  is  not  in  itself  a  limitation,  the  dynamic  conditions  of  the  contemporary 
operating  environment  do  not  wholly  resemble  the  conditions  present  a  decade  earlier, 
which  may  result  in  a  loss  of  fidelity  in  some  of  the  final  recommended  models 
constructed  using  these  older  data  sets.  In  addition  to  the  independent  variables,  the 
dependent  variable  “transitions  into  conflict”  is  limited  by  the  availability  of  data 
provided  by  the  Heidelberg  Institute  for  Conflict  Research  (HIIK).  This  variable  is 
derived  from  the  HIIK’s  annually  published  Conflict  Barometer;  the  2014  Conflict 
Barometer  is  the  most  current  available  publication,  and  thus  year  2014  sets  the 
benchmark  for  all  forecasting  analyses  conducted  in  this  study. 

1.7  Implications 

Dr.  George  Box  once  remarked  that  “Essentially,  all  models  are  wrong,  but  some 
are  useful”  (Box,  1979).  While  predictive  accuracy  is  an  important  aspect  of  this  study,  it 
was  never  the  goal  to  develop  a  model  with  perfect  accuracy.  Instead,  it  is  the  goal  of 
this  study  to  gain  relevant  and  actionable  insights  from  the  suite  of  models  developed 
herein.  These  insights  include  identifying  the  regional  factors  relevant  to  conflict 
transitions,  nations  susceptible  or  “immune”  to  these  transitions,  and  regional  conflict 
trends  that  may  impact  future  policy  decisions.  It  is  the  expectation  of  this  research  to 
provide  commanders  and  national  level  leadership  an  accurate  and  tractable  analysis  to 
aid  the  development  and  execution  of  future  foreign  policy  and  security  strategies. 
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1.8  Overview  of  Remaining  Chapters 

Including  the  introduction,  this  thesis  is  comprised  of  five  chapters.  Chapter  2 
reviews  previous  studies  and  literature  pertaining  to  conflict  prediction,  logistic 
regression,  and  Markov  models.  The  detailed  literature  review  is  vital  to  narrow  the 
research  scope  of  this  study  and  provide  insights  into  viable  methods  for  the  modeling 
and  prediction  of  violent  conflict  transitions.  Chapter  3  presents  an  in-depth  discussion 
of  the  data  base  design  methodology,  mathematics,  notation,  modeling  approach,  and 
software  required  to  answer  the  study  questions.  Chapter  4  provides  a  validation  of  both 
the  conditional  logistic  regression  and  Markov  models,  and  presents  a  comprehensive 
analysis  of  the  results  obtained  from  said  models.  Finally,  Chapter  5  summarizes  the 
methodology,  results,  conclusions,  and  limitations  of  this  research,  and  finally  proposes 
operationally  relevant  studies  that  may  capitalize  on  the  methodology  and  results 
developed  in  this  thesis. 
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II.  Literature  Review 


“Therefore,  just  as  water  retains  no  constant  shape,  so  in  warfare  there  are  no  constant 

conditions.  ” 


Sun  Tzu,  The  Art  of  War 


2.1  Overview 

This  research  is  an  effort  to  define  a  methodology  for  the  use  of  multi-state 
Markov  chain  models  (MCM)  for  the  prediction  of  nation-state  transitions  into  and  out  of 
violent  conflict.  With  this  objective  in  mind,  this  chapter  is  broken  down  into  five 
sections,  beginning  with  this  overview.  The  second  section  of  this  chapter  is  a  survey  of 
previous  nation-state  conflict  prediction  studies,  with  a  focus  on  models  predicting 
conflicts  post  2001,  their  methodologies  and  subsequent  predictive  success  rates.  The 
third  section  reviews  non-conflict  oriented  prediction  studies  utilizing  multi-state  Markov 
models,  with  an  emphasis  on  viral  epidemiology  and  spatial  relations.  The  final  section 
provides  a  synopsis  and  definitions  for  the  different  levels  of  conflict,  which  may  be 
modeled  as  states  within  the  MCM,  and  examines  common  prediction  variables  used  in 
previous  studies  as  well  as  additional  variables  that  may  be  relevant  to  this  analysis.  This 
review  is  not  exhaustive;  instead  it  examines  the  variables  and  regional  dynamics, 
highlighting  the  nature  and  factors  unique  to  modern  violent  conflict. 

2.2  Nation-State  Conflict  Prediction:  Relevant  Research 

For  the  purpose  of  this  research,  we  are  primarily  concerned  with  analytical  and 
predictive  studies  conducted  during  the  Era  of  Persistent  Conflict:  the  period  following 
the  terrorist  attacks  of  September  11th,  2001,  influenced  by  the  dynamic  and  unique 
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challenges  posed  by  the  modern  international  political  landscape.  The  seminal  work,  A 
Global  Forecasting  Model  of  Political  Instability,  a  Central  Intelligence  Agency  (CIA) 
funded  study  led  by  Dr.  Jack  A.  Goldstone,  part  of  the  CIA’s  Political  Instability  Task 
Force,  derived  a  series  of  models  predicting  political  instability  two  years  prior  to  event 
onset  (Goldstone,  et  ah,  2005).  Utilizing  a  set  of  global,  open-source  data,  spanning  the 
time  frame  of  1955  -  2003,  the  CIA  study  compiled  an  exhaustive  list  of  instability 
events,  with  the  final  problem  set  including  nearly  300  “Adverse  Regime  Changes”, 
“Ethnic  Wars”,  “Revolutionary  Wars”,  and  “Genocides/Politicides”  (Goldstone,  et  ah, 
2005).  The  study’s  dependent  variable  was  the  onset  of  political  instability  brought  about 
by  the  occurrence  of  one  or  more  of  the  problem  set  events.  Multiple  methodologies, 
including  event  history  models,  logistic  regression,  neural  networks,  and  Markov 
processes  were  employed  to  identify  factors  associated  with  political  instability,  the  onset 
of  which  is  considered  a  rare  event,  given  definitions  laid  out  in  their  study.  As  a  result, 
the  case  control  method,  common  in  epidemiological  analysis  of  rare  occurrences, 
became  their  primary  methodology  (Goldstone,  et  ah,  2005).  The  study  initially  tested 
hundreds  of  variables  under  the  assumption  that  the  complexity  associated  with  the  onset 
of  political  instability  would  require  an  equally  complex  model  or  set  of  models,  each 
specific  to  regime  type  and  problem  set  event  (Goldstone,  et  ah,  2005).  In  actuality,  the 
CIA-funded  study  determined  these  initial  assumptions  to  be  incorrect,  noting  that  a 
small  subset  of  the  original  variables  and  a  relatively  simple  model  were  sufficient  to 
model  political  instability  across  various  regime  types. 

What  separates  the  CIA-funded  study  from  past  conflict  and  political  instability 
prediction  studies  was  the  ability  to  significantly  reduce  the  unexplained  variance  in  the 
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model.  Previous  quantitative  studies,  sought  only  to  find  statistically  significant 
variables,  but  paid  little  attention  to  the  variables  ability  to  explain  variance  within  the 
overall  model  (Ward,  Greenhill,  &  Bakke,  2010).  However,  as  Dr.  Michael  Ward  so 
adroitly  points  out,  previous  studies  spent  significant  effort  in  the  pursuit  of  finding 
statistically  significant  variables  but  little  effort  in  detennining  what  variables  actually 
improve  the  predictive  ability  of  the  models.  As  a  result,  most  of  these  models  fail  to 
achieve  predictive  accuracy  rates  in  excess  of  50%,  and  often  times  are  convoluted  and 
difficult  to  interpret.  For  nearly  three  years,  the  Political  Instability  Task  Force  struggled 
to  develop  a  model  having  an  accuracy  greater  than  60-70%.  However,  their 
methodology  combined  with  an  internally  developed  four-part  regime  categorization 
yields  postdictive  accuracy  rates  of  80%  or  greater  (Goldstone,  et  ah,  2005).  However, 
the  CIA  funded  study  can  only  achieve  these  postdictive  rates  on  a  subset  of  randomly 
sampled,  politically  vulnerable  nation-states,  and  thus  cannot  achieve  “whole  world” 
accuracy  (Boekestein,  2015). 

In  2007,  the  Center  for  Army  Analysis  (CAA)  initiated  the  Forecast  and  Analysis 
of  Complex  Threats  (FACT)  study,  which  eventually  became  a  series  of  four  studies 
(FACT  I-IV),  each  refining  the  data  and  methodology  of  the  previous  study.  The  original 
study  directors  Shearer  and  Marvin  sought  to  develop  a  methodology  to  “predict  the 
future  conflict  of  select  nation-states,  but  in  a  manner  that  facilitated  explanation;”  in 
essence  a  relatively  simple  model  that  was  still  relevant  to  the  Army  Staff  (Shearer  & 
Marvin,  2010).  Conflict  data  used  in  the  FACT  studies  was  collected  from  the 
Heidelberg  Institute  of  International  Conflict  Research  (HIIK),  which  at  the  time 
classified  conflict  intensity  levels  into  six  categories:  No  Conflict,  Dispute,  Latent 
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Conflict,  Crisis,  Severe  Crisis,  and  War;  these  categories  have  since  been  updated  to:  0  - 
No  Conflict,  1  -  Dispute,  2  -  Non-Violent  Crisis,  3  -  Violent  Crisis,  4  -  Limited  War, 
and  5  -  War  (Heidelberg  Institute  for  International  Conflict  Research,  2014).  As  part  of 
the  methodology,  the  FACT  study  maps  the  four  highest  HIIK  intensity  levels  to  two 
categories:  Conflict  and  Peace  (Shearer  &  Marvin,  2010).  In  addition  to  the  HIIK  data, 
the  FACT  studies  utilized  a  variety  of  open  source  governmental  and  non-governmental 
databases,  such  as  the  World  Bank,  Food  and  Agricultural  Organization  of  the  United 
Nations,  and  the  Polity  IV  Project  to  gather  feature  (macro-structural  indicators)  data. 
The  methodologies  employed  in  FACT  I-III  used  a  common  weighted  moving  average 
forecasting  model  combined  with  a  factor  analysis  algorithm  to  classify  their  specific 
future  feature  vectors,  known  as  the  K-Nearest  Neighbor.  Using  principal  component 
analysis  (PCA)  was  used  to  create  the  multiple  features  employed  in  the  study,  ultimately 
maximizing  the  explained  variance  within  the  data.  The  K-Nearest  Neighbor  algorithm 
then  classifies  each  of  these  feature  vectors  as  a  function  of  the  n-closest  past  feature 
vectors,  with  decision  rules  requiring  either  a  simple  or  super-majority  for  a  classification 
of  Conflict  or  Peace,  with  best  results  occurring  when  K  =  7  (Shearer  &  Marvin,  2010). 
The  FACT  studies  yielded  accuracies  in  excess  of  85%  when  the  predicted  nation  scores 
were  classified  as  conflict,  peace,  or  uncertain.  However,  this  high  postdictive  accuracy 
is  due  to  the  fact  that  25%  of  the  157  considered  nations  are  categorized  as  “uncertain”, 
reducing  the  overall  confidence  in  the  predictive  ability  of  the  model. 

In  his  2011  paper,  Predicting  Armed  Conflict,  2010-2050,  Hegre  employs 
dynamic  multinomial  logit  model  estimation  techniques  to  develop  a  three-state  transition 
probability  matrix  capable  of  predicting  changes  in  global  and  regional  incidences  of 


11 


armed  conflict  out  to  year  2050.  The  Hegre  study  created  and  used  the  Uppsala/PRIO 
conflict  data  set,  consolidating  relevant  data  for  169  countries  from  1970  to  2009.  The 
Uppsala/PRIO  data  reports  three  conflict  levels:  “No  Conflict”  or  less  that  25  combat- 
related  deaths  per  year,  “Minor  Conflict”  or  between  25  and  999  combat  related  deaths 
per  year,  and  “Major  Conflict”  when  greater  than  1000  combat  related  deaths  are  reported 
in  a  year  (Hegre  et  ah,  2011).  The  primary  predictive  methodology  employed  by  Dr. 
Hegre,  was  a  C++  based  simulation  based  upon  a  statistical  model  of  conflict  onset, 
escalation,  and  termination  dependent  on  a  set  of  both  endogenous  and  exogenous 
variables  (Hegre  et  ah,  2011).  The  methodology  employs  a  nine  step  process  of  (1) 
Estimating  the  underlying  statistical  model  through  dynamic  multinomial  estimation;  (2) 
Developing  assumptions  about  the  distribution  of  the  exogenous  variables;  (3)  Simulating 
conflicts  for  the  current  year;  (4)  Drawing  a  realization  of  the  coefficients  of  the 
multinomial  logit  model  (Equation  8);  (5)  Calculating  the  nine  probabilities  of  transition 
between  states  shown  in  Table  1;  (6)  Randomly  drawing  whether  a  country  experiences 
conflict,  based  on  estimated  probabilities;  (7)  Updating  the  values  of  the  explanatory 
variables;  (8)  Repeat  steps  (4)  -  (7)  for  each  year  of  the  forecast;  and  finally  (9) 
Repeating  step  (3)  -  (8)  a  number  of  times  to  even  out  the  impact  of  individual 
realizations  of  the  multinomial  logit  coefficients  and  the  individual  values  of  the 
probability  distributions  (Hegre  et  ah,  2011). 

A  dynamic,  multinomial  logit  model  was  used  to  estimate  the  probability 
transition  matrix  with  the  outcome  at  time  t,  based  on  a  t-1  set  as  the  indicator  variables. 
The  model  is  identified  by  setting  the  baseline  outcome  to  j  =  0,  “No  Conflict”,  resulting 
in  the  estimates  /?x  and  /?2  being  interpreted  as  the  impact  of  the  explanatory  variable,  x, 
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on  the  probability  of  being  in  “Minor  Conflict”  and  “Major  Conflict”  relative  to  “No 
Conflict”  (Hegre  et  al.,  2011).  Essentially,  this  model  shows  which  variables  increase 
the  risk  of  conflict  onset;  however,  the  predicted  duration  of  the  conflict  is  calculated 
through  the  use  of  interaction  terms  between  the  states  at  t-1  and  the  predictor  variables, 
producing  the  transition  probability  matrix  shown  in  Table  1. 

Table  1:  Transition  Probability  Matrix:  Conflict  at  t  vs.  t-1, 1970-2009 

(Hegre,  Karlsen,  Nygard,  Strand,  &  Urdal,  2011) 


Conflict  at  t-1 

No  Conflict 

Minor  Conflict 

Major  Conflict 

Total 

No  Conflict 
Minor  Conflict 
Major  Conflict 

5116(0.966) 
145  (0.207) 
24  (0.070) 

156  (0.029) 
481  (0.689) 
70  (0.205) 

23  (0.004) 
72  (0.103) 
247  (0.724) 

5295  (1.000) 
698  (1.000) 
341  (1.000) 

Observations 

5285 

707 

342 

6334 

Row  proportions  in  parentheses 


The  Hegre  Model  divides  the  world  into  nine  regions,  based  upon  the  observation 
that  conflict  tends  to  cluster  in  a  few  geographical  regions,  sharing  similar  rates 
associated  with  risk  factors  such  as  infant  mortality  rates  or  poverty  levels.  These  regions 
are:  South  and  Central  America  and  the  Caribbean;  Western  Europe,  North  America,  and 
Oceania;  Eastern  Europe;  Western  Asia  and  North  Africa;  West  Africa;  East  and  Central 
Africa;  Southern  Africa;  South  and  Central  Asia;  Eastern  and  South  East  Asia.  However, 
the  methodology  further  investigates  the  “neighborhoods”  associated  with  each  nation. 
The  neighborhood  of  country  A  is  defined  as  all  n  countries  \Bj...Bn\  that  share  a  border 
with  A;  where  country  A  shares  a  border  with  country  B;  if  there  is  less  than  100km 
distance  between  any  points  of  their  territories  (Hegre  et  al.,  2011).  This  was  an 
important  factor  in  their  methodology,  as  it  allowed  them  to  model  the  cross  border 
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effects  of  conflict  on  near  neighbors,  creating  a  measure  of  neighborhood  effects,  relevant 
to  each  country. 

The  Hegre  model  is  unique  from  previous  conflict  prediction  studies,  as  it  does 
not  restrict  its  predictions  to  solely  the  onset  of  conflict,  thereby  excluding  ongoing 
conflicts.  Additionally,  the  Hegre  Model  simultaneously  predicts  conflict  onset, 
escalation,  and  termination,  allowing  for  the  prediction  of  both  the  global  and  regional 
incidence  of  armed  conflict.  The  prediction  horizon  is  also  unique  to  the  Hegre  model 
due  to  its  length,  7-9  years,  with  an  average  postdictive  accuracy  (across  all  regions)  of 
79%,  and  a  false  positive  rate  of  8.5%  given  a  probability  threshold  of  p  >  0.3,  for  the 
state  of  interest  (Hegre  et  ah,  2011).  As  the  title  of  Hegre’s  paper  indicates,  his  objective 
was  to  predict  conflicts  out  to  year  2050,  which  he  accomplished  through  the  use  of 
projections  of  predictor  variables,  as  provided  by  the  UN  World  Population  Prospects  and 
the  International  Institute  of  Applied  Systems  Analysis  (Hegre  et  ah,  2011).  Using  this 
data,  Hegre  predicts  an  overall  decline  in  the  global  incidence  level  of  violent  conflict;  a 
decline  attributed  to  improvements  in  variables  associated  with  infant  mortality, 
education  and  youth  bulges  (Hegre  et  ah,  2011).  However,  since  these  long  tenn 
predictions  are  based  on  projections  as  opposed  to  actual  data,  the  Hegre  Model  estimates 
should  be  interpreted  as  long-tenn  global,  and  to  a  lesser  extent  regional,  conflict  trends, 
given  projected  conditions  as  opposed  to  specific  national  level  predictions. 

Boekestein  conducted  the  most  recent  analysis  concerning  the  prediction  of  future 
nation-state  conflict,  in  his  study  A  Predictive  Logistic  Regression  Model  of  World 
Conflict  Using  Open  Source  Data  (Boekestein,  2015).  As  the  name  implies,  the 
Boekestein  model  uses  logistic  regression  similar  to  the  CIA-funded  and  FACT  studies  to 
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produces  a  parsimonious  model  that  is  tailored  to  each  of  the  six  geographical  regions 
identified  in  his  study:  Sub-Saharan  Africa;  South  and  East  Asia;  Eastern  Europe  and 
Central  Asia;  Arab  Nations;  Organization  for  Economic  Cooperation  and  Development 
(OECD);  and  Latin  America,  comprising  180  of  the  193  United  Nations  member  nations, 
with  the  states  of  Palestine  and  Kosovo  also  included  for  consideration.  As  in  previous 
studies,  conflict  intensity  or  level  of  violence  was  chosen  as  the  dependent  variable  for 
this  study  and  is  based  off  the  levels  calculated  by  the  HIIK.  The  HIIK  levels  of  violence 
are  calculated  using  the  five  metrics  of:  Weapons  -  light  or  heavy,  Personnel  -  number 
engaged  per  month;  Casualties  -  number  per  month,  destruction  -  infrastructure, 
accommodation,  economy,  and  culture;  and  Cross  Border  Refugees  and  Internally 
Displaced  Persons  (IDP)  -  number  per  month  (Heidelberg  Institute  for  International 
Conflict  Research,  2014).  Using  these  metrics,  the  HIIK  assigns  one  of  the  six 
aforementioned  intensity  levels  to  every  identified  political  conflict.  The  Boekestein 
model  subsequently  maps  these  six  levels  of  conflict  to  two  dependent  variables:  “Not 
Violent  Conflicts”:  Levels  0-2,  and  “Violent  Conflicts”:  Levels  3-5. 

Twenty-two  statistic  and  four  trend  variables  were  considered  for  this  study, 
thirteen  of  which  are  common  to  the  CIA  funded  and  the  FACT  studies  (Boekestein, 
2015).  The  data  supporting  these  variables  is  gathered  from  multiple  sources  to  include 
the  World  Bank,  HIIK,  and  the  CIA  World  Fact  Book,  with  some  sources  maintaining 
data  sets  from  1970.  As  Boekestein  points  out,  many  of  these  data  sets  are  not  complete 
or  available  for  the  current  year  of  the  study,  requiring  a  two  or  three  year  lag  in  the 
model  to  predict  current  year  nation-state  conflict  levels.  Additionally  some  variables 
had  significant  gaps  in  the  data  requiring  imputation  to  complete  the  data  set.  For 
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example  the  data  set  supporting  the  variable  “Conflict  in  Bordering  States”,  whose 
calculation  took  into  account  the  number  of  bordering  nations,  and  the  percent  border 
shared  with  nation  i,  required  the  imputation  of  data  for  29  island  nations  (Boekestein, 
2015).  A  rigorous  variable  screening  process,  to  check  for  collinearity  among  the  set  of 
26  variables  was  implemented  prior  to  model  development  using  three  separate  analysis 
methods.  Despite  the  rigorous  testing,  the  initial  Boekestein  models  failed  to  achieve 
postdictive  accuracy  rates  in  excess  of  76%.  To  improve  the  model,  several  factor 
analysis  and  noise  reduction  techniques  were  used  to  reduce  the  initial  set  of  23  variables 
to  a  set  of  six  factors,  with  highly  correlated  variables  represented  by  a  single  factor 
(Ahner,  Boekestein,  &  Deckro,  2015).  Given  the  nature  of  the  study,  it  was  also 
desirable  to  minimize  the  number  of  false  negative  reports  by  the  model,  i.e.,  the  number 
of  times  the  model  predicts  “Not  in  Violent  Conflict”  when  in  actuality  the  nation  in 
question  is  in  “Violent  Conflict”  (Boekestein,  2015).  This  objective  was  accomplished 
by  adjusting  the  logistic  regression  cutoff  level,  for  which  the  default  setting  was  0.5, 
through  extensive  sensitivity  testing.  The  testing  detennined  a  potential  need  for  an 
additional  variable  to  explain  a  nation’s  region,  due  to  the  nature  of  the  particular  nations 
consistently  reporting  as  either  false  positives  or  false  negatives.  This  insight  led  to  the 
construction  of  separate  model  for  each  of  the  six  previously  identified  regions.  Each 
model  employs  a  specific  subset  of  variables  from  the  original  26  statistic  and  trend 
variables  that  best  describe  the  conflict  risk  factors  unique  to  each  region.  This 
methodology  resulted  in  a  reduction  of  false  negative  predictions  in  the  range  of  2-7%, 
and  a  combined  postdictive  accuracy  for  both  the  model  and  validation  sets  of  80.22%, 
given  a  logistic  regression  cutoff  of  0.28  (Ahner,  Boekestein,  &  Deckro,  2015). 
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2.3  Markov  Models  and  the  Prediction  and  Spread  of  Disease  Epidemics 

The  prediction  of  the  outbreak  and  spread  of  disease  epidemics  in  many  ways  is 
analogous  to  the  study  and  prediction  of  violent  conflict  and  its  antecedents.  Additionally 
given  the  various  states  of  disease  a  host,  outbreak,  or  epidemic  can  exist  in,  such  as  no 
signs  of  disease,  susceptible,  infected,  and  cured,  the  prediction  methodology  lends  itself 
to  the  use  of  Markov  models.  The  2007  paper,  Bayesian  Markov  switching  models  for 
the  early  detection  of  influenza  epidemics,  explores  a  methodology  for  the  early  detection 
of  influenza  outbreaks,  using  a  two-state  Markovian  process.  The  methodology  created 
by  Martinez-Beneito  and  his  team,  employs  a  two-state,  or  binary,  hidden  Markov 
process  in  which  the  population  is  in  a  non-epidemic  or  epidemic  phase,  states  0  and  1 
respectively.  The  underlying  concept  of  the  model  is  to  associate  the  variable  T/j,  the 
difference  in  disease  rates  between  weeks  i  and  i+1  in  year  j,  with  Zy,  the  unobserved 
random  variable  that  indicates  the  state  of  the  system  (Martinez-Beneito  et  ah,  2007). 
The  model  for  the  Yjj  variable  is  specific  to  the  state  and  season  of  the  system,  and  is 
either  an  Gaussian  white-noise  process  (non-epidemic)  or  an  autoregressive  process  of 
order  1  (epidemic).  Upon  determination  of  the  model,  the  parameters  P0,o  and  P/j  were 
estimated  using  the  Bayesian  paradigm  requiring  the  specification  of  prior  distributions. 

To  validate  the  model’s  predictive  accuracy,  Martinez-Beneito  compared  its 
performance  using  a  near  tenn  partial  and  complete  data  set.  The  model  was  constructed 
using  a  dataset  covering  a  nine-year  period,  allowing  the  team  to  develop  robust  estimates 
of  the  various  parameters  used  in  the  model.  However,  given  the  nature  of  disease 
outbreaks,  time  horizons  are  measured  in  weeks  as  opposed  to  months  or  years,  requiring 
that  the  model  be  tested  using  limited  subset  of  the  near-tenn  preceding  weeks.  The 
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results  showed  that,  even  with  the  reduced  data  set,  the  model  predicted  the  same 
incidence  of  epidemic  in  93%  of  the  scenarios  of  the  model  using  the  full  data  set,  given  a 
p  >  0.30  (Martinez -Beneito  et  al.,  2007). 

In  his  2004  paper,  The  analysis  of  hospital  infection  data  using  hidden  Markov 
models.  Cooper  proposes  a  new  process  to  analyze  infections  that  are  generally 
considered  endemic  to  hospitals,  and  are  carried  asymptomatically  before  infections 
begin  to  appear  in  proportions  of  the  patient  population.  The  data  associated  with 
hospital-acquired  infections  generally  consists  of  short  time  series  with  low  number 
counts  (Cooper  &  Lipsitch,  2004).  For  his  analysis,  Cooper  stresses  the  importance  of 
patient-to-patient  transmission,  which  shares  many  similarities  to  conflict  spillover  from 
one  state  to  the  next.  The  transmission  chain  is  modeled  using  a  structured  hidden 
continuous  time  Markov  chain  over  a  short  time  increment  h.  Germane  to  this  discussion 
are  the  parameters  (3 /N,  the  transmission  rate  to  each  susceptible  patient  in  population  N 
given  an  infected  host;  v,  the  probability  of  being  a  pathogen  host;  p,  patient  discharge 
rate;  and  Ct  e  (0, 1, 2, ...  N],  the  state  of  the  system  given  as  the  number  of  infected  hosts 
at  time  t  (Cooper  &  Lipsitch,  2004). 

In  this  model  new  infections  arise  due  to  cross-infection,  at  a  rate  proportional  to 
the  product  of  the  number  of  infected  hosts,  Ch  and  the  number  of  susceptible  patients,  ( N 
-  C, ).  New  infections  can  also  occur  in  the  newly  discharge  susceptible  population 
(Cooper  &  Lipsitch,  2004).  In  the  modeling  of  cross  border  conflict  spill-over,  the 
parameter  (3/N,  can  be  interpreted  as  the  proportion  of  a  nation’s  border  that  shares  a 
mutual  border  with  a  state  currently  in  violent  conflict;  where  v  and  p  are  the  respective 
probabilities  of  entering  and  terminating  a  conflict  given  a  neighboring  state  is  in  conflict. 
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In  his  concluding  remarks,  Cooper  identifies  several  limitations  associated  with  his 
methodology,  specifically  that  it  may  not  be  appropriate  for  large  systems.  He 
specifically  states:  “A  further  limitation  is  that  while  such  a  model  may  be  appropriate  for 
a  single  ward  or  unit,  for  larger  hospital  populations  made  up  of  several  interacting  units 
its  value  is  not  so  clear”  (Cooper  &  Lipsitch,  2004).  He  ties  the  reason  for  this  limitation 
to  the  model’s  use  of  short  time  series  with  limited  data,  increasing  the  collinearity 
between  multiple  variables.  Therefore,  the  overall  methodology  is  likely  not  appropriate 
for  the  prediction  of  nation-state  conflict,  but  the  modeling  of  disease  transmission  gives 
insight  on  how  to  possibly  model  cross-border  conflict  spillover. 

The  final  methodology  we  will  explore  is  the  Modeling  of  Viral  Epidemiology  in 
Connected  Networks,  discussed  by  Spears  of  the  Naval  Research  Laboratory.  In  this 
instance,  Spears  adapts  methodology  for  the  prediction  and  spread  of  disease  epidemics 
and  applies  them  to  the  spread  of  computer  viruses  in  a  network.  Given  the  level  of 
interconnectedness  shared  by  most  nations,  a  result  of  globalization,  it  is  easy  visualize 
the  current  geo-political  topology  as  a  vast  network,  where  conflict  in  one  state  sends 
shockwaves  through  the  network,  eventually  affecting  numerous  other  nations.  The 
methodology  for  this  research  employs  very  general  discrete-time  Markov  chains  and 
continuous-time  differential  equations  to  model  the  propagation  of  viral  attacks  in  a 
network.  The  network  envisioned  in  this  in  this  study  consists  of  N  nodes  that  exist  in 
one  of  four  medical  conditions  or  states:  S,  susceptible;  E,  exposed;  /,  infected;  and  C, 
cured  (Spears,  2001).  The  discussion  of  the  methodology  builds  upon  two-  and  three- 
state  Markov  chains,  but  for  the  purposes  of  this  discussion  we  will  focus  on  his  four 
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condition  S-E-I-C  model.  In  this  model  a  susceptible  node  must  be  exposed  to  the 
pathogen  before  it  becomes  infected,  infected  before  cured,  cured  before  susceptible. 

The  transition  associated  with  this  Markov  model  requires  that  /’  -  /  more  nodes 
become  infected  at  time  j,  where  /’  can  be  less  than,  equal  to,  or  greater  than  /,  with 
similar  requirements  for  the  other  three  medical  conditions  (Spears,  2001).  The  transition 
probabilities  for  this  model  take  on  binomial  characteristics  that  either  a  node  exists  in  a 
specific  medical  condition,  or  it  does  not.  Four  variables  are  employed  in  this  model:  a, 
the  probability  a  susceptible  patient  become  exposed  to  a  pathogen;  fi,  the  probability  an 
exposed  patient  is  infect;  8,  the  probability  an  infected  patient  is  cured;  and  8',  the 
probability  a  cured  patient  become  susceptible.  In  the  end,  the  methodology  employed 
by  Spears  may  permit  the  modeling  of  the  spread  of  violent  conflict  as  a  function  of 
bordering  states,  or  geographic  nearest  neighbors  in  the  case  of  island  nations. 
Additionally,  through  the  depiction  of  strongly  and  weekly  connected  nodes,  we  have  a 
methodology  that  may  simulate  secure  and  porous  international  borders. 

2.4  Relevant  Variables 

As  stated  previously,  many  conflict  prediction  studies  have  expended  substantial 
effort  and  resources  in  the  pursuit  of  statistically  significant  variables  while  failing  to 
understand  how  those  variables  improve  the  predictive  qualities  of  their  respective 
models.  When  analyzing  conflict  predictor  variables  used  in  previous  studies,  one  must 
ask:  “will  these  variables  still  remain  significant  in  future  conflicts?”  Furthermore,  will 
variables  currently  identified  as  insignificant  in  current  conflicts  become  significant  as 
the  nature  of  violent  conflict  evolves?  When  analyzing  and  studying  different  conflict 
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predictor  variables,  the  analyst  must  avoid  falling  into  a  common  trap  propagated  through 
dystopian  visions  of  future  conflict  that  are  so  common  in  this  day  and  age  (Johnson, 
2014).  The  trap  is  the  belief  that  future  global  incidences  of  violent  conflict  will  only 
increase,  eventually  becoming  unmanageable  by  most  national  governments.  However, 
as  Hegre  noted,  if  UN  projections  prove  reliable,  his  model  actually  predicts  the  opposite 
outcome,  with  the  global  incidence  of  conflict  decreasing  by  2050.  All  this  being  said,  it 
is  imperative  the  analyst  understands  the  effects  of  historical  predictor  variables  on 
current  conflicts  while  staying  abreast  of  emerging  trends,  predictors,  and  their  effects 
that  will  frame  the  nature  of  future  conflicts. 

The  recent  Boekestein  study  created  a  model  using  27  total  variables,  achieving 
accuracy  rates  in  excess  of  80%  by  region.  Given  these  results,  one  can  assume  the  set  of 
27  statistic  and  trend  variables  represent  a  set  of  available  predictors  that  offer  excellent 
predictive  accuracy  of  modern  violent  conflicts,  if  properly  tailored  for  different  world 
regions.  As  is  seen  in  Table  2,  each  region  in  the  Boekestein  model  has  a  particular 
subset  of  relevant  variables,  with  some  regions  requiring  as  few  as  two  and  other  regions 
as  many  as  nine.  Additionally,  the  importance  of  the  variables,  referenced  by  the  index 
corresponding  to  the  variable-region  intersection  in  Table  2,  is  also  region  specific.  For 
example,  individual  freedom  statistics  were  shown  to  be  the  most  significant  variable  in 
three  regions:  Sub-Sahara  Africa,  Easter  Europe  and  Central  Asia,  and  the  OECD,  but  is 
the  fifth  most  significant  variable  for  Arab  Nations  and  Latin  America.  However,  this 
table  is  not  all-inclusive  due  to  the  absence  of  variables:  border  conflict,  religious 
diversity,  ethnic  diversity,  and  the  HIIK  trend,  which  were  removed  during  final  model 
construction  (Boekestein,  2015). 
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Table  2:  Region  Specific  Relevant  Variables 

(Boekestein,  2015) 


Variable/Region 

Sub-Sahara 

Africa 

South  and  East 

Asia 

Arab  Nations 

Easterm  Europe 
and  Central 

Asia 

OECD 

Latin  America 

Freedom 

1 

5 

1 

1 

5 

2  Yr  Freedom  Trend 

9 

3  Yr  Freedom  Trend 

6 

5  Yr  Freedom  Trend 

8 

Regime  Type  (Central) 

7 

4 

Regime  Type(Democratic) 

9 

Polity  IV 

3 

2 

GDP  Per  Cpita 

6 

Refugees  Asyulm 

3 

3 

7 

Refugee  Origin 

4 

Unemployment 

5 

8 

Rural  Population 

3 

Infant  Mortality 

2 

4 

Caloric  Intake 

1 

5 

Death  Rate 

3 

1 

1 

Arable  Land 

2 

Population  Growth  Rate 

7 

Improved  Water 

2 

Trade 

4 

2 

4 

2 

6 

A  review  of  previous  studies  reveals  that  variables  such  as  political  statistics, 
conflict  history,  infant  mortality  rates  (IMR),  population  statistics,  civil  liberties,  and 
ethnic/religious  dominance  or  diversity  are  frequently  employed  as  significant  predictors 
of  violent  conflict.  In  addition  to  these  variables,  the  Hegre  model  introduces  the 
variables  related  to  current  conflict  intensity,  education,  youth  bulges,  international 
treaties,  neighborhood  characteristics  (a  conglomeration  of  growth  rates,  per  capita  GDP, 
education  levels,  IMR,  and  other  political  considerations),  and  oil  (Hegre  et  ah,  2011). 
The  oil  variable  is  of  particular  interest  due  to  the  hypothesis  that  nation-states  whose 
GDP  is  dependent  upon  primary  commodities  through  export  revenue,  such  as  oil,  tend 
towards  weaker  governmental  institutions  putting  them  at  greater  risk  for  violent  conflict 
(Hegre  et  al.,  2011). 
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Economic  factors  such  as  gross  domestic  product  (GDP),  primary  commodity 
exports,  and  income  growth  have  also  been  demonstrated  as  significant  predictors  of 
violent  conflict.  As  noted  in  their  2002  paper,  On  the  Incidence  of  Civil  War  in  Africa, 
Collier  and  Hoeffler  employ  an  econometric  model  to  predict  incidences  of  conflict  in 
Africa  (Collier  &  Hoeffler,  2002).  While  their  study  primarily  focused  on  African 
nations,  the  authors  note  that  patterns  of  conflict  in  Africa  largely  resemble  other 
developing  regions  throughout  the  globe,  indicating  that  economic  variables  may  be 
useful  conflict  predictors  in  other  regions.  Similarly,  other  studies  have  shown  that 
population  variables,  specifically  ethnic  and  religious  oriented  statistics,  are  powerful 
predictors  of  and  historical  contributors  to  violent  conflict.  In  the  2001  paper  titled 
Ethnicity,  Insurgency,  and  Civil  War,  Fearon  and  Laitin  aurgue  that  post-Cold  War  civil 
wars  and  insurgency  were  driven  and  exacerbated  through  numerous  ethnic  and  religious 
factors,  not  the  least  of  which  was  ethnic  nationalism  (Fearon  &  Laitin,  2001). 

In  his  book  Out  of  the  Mountains:  The  Coming  age  of  the  Urban  Guerilla, 
Kilcullen  discusses  several  drivers  of  violence  and  instability  that  may  compliment  the 
current  set  of  common  conflict  predictors.  The  basic  premise  of  his  work  is  that  future 
conflict  is  likely  to  occur  in  the  urban  sprawl  of  coastal  mega-cities,  and  in  the  peri-urban 
settlements  that  exist  in  Africa,  the  Middle  East,  Latin  America,  and  Asia  (Kilcullen, 
2013).  He  discusses  the  growth  of  criminal  violence  networks  combined  with  a 
simultaneous  decay  or  complete  lack  of  basic  infrastructure  (such  as  sanitation)  as  two 
drivers  of  instability.  The  respective  rates  of  growth  and  decay  of  these  two  predictors 
may  serve  viable  and  significant  variables  in  a  predictive  model.  The  same  variables, 
along  with  national  inflation  rates,  changes  to  military  expenditure/manning  levels,  and 
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the  application  of  international  sanctions  are  echoed  in  the  paper,  Statistical  Approaches 
to  Developing  Indicators  of  Armed  Conflict,  whose  purpose  is  to  “explore  the  feasibility 
of  developing  a  meaningful  system  of  indicators  of  armed  violence”  (Kisielewski,  Rosa, 
&  Asher,  2010). 

2.5  Summary 

In  this  chapter,  we  surveyed  multiple  sources  to  identify  useful  methods,  theories, 
and  variables  to  enable  the  development  of  a  methodology  for  the  prediction  of  violent 
nation-state  conflict  using  Markov  Chain  Models.  Previous  conflict  prediction  studies 
employ  multiple  methodologies  to  include  logistic  regression  and  simulation  with  the 
best  models  achieving  accuracy  rates  in  excess  of  80%.  While  methodologies  and 
variables  differed  between  studies,  the  common  trend  was  to  construct  region-specific 
models  to  better  estimate  the  global  incidence  of  violence.  This  methodology  allows  for 
the  use  of  region-specific  significant  variables,  whose  value  when  applied  to  a  global 
model  may  be  insignificant  or  even  detrimental  to  the  predictive  accuracy  of  the  model. 
Next,  we  survey  a  group  of  studies  using  Markov  models  for  the  prediction  and  analysis 
of  the  spread  of  disease  epidemics,  which  share  common  traits  with  the  spread  of  conflict. 
The  studies  featured  in  this  chapter  use  multi-state  hidden  Markov  models,  emphasizing 
patient-to-patient  transfer  of  pathogens.  Notable  studies  combine  Markov  models  with 
strong  and  weakly  connected  patient  networks,  which  may  provide  a  suitable 
methodology  to  the  modeling  of  the  nation-states  within  their  various  regions.  Finally, 
we  review  commonly  used  and  emerging  predictor  variables  that  are  relevant  to  modern 
conflict.  Such  variables  include  recent  conflict  history,  infant  mortality  rates,  various 
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political  and  economic  statistics,  and  region  type.  Subject  matter  experts  also  identify 
predictors  such  as  crime  rates,  population  migration,  and  changes  to  military  spending 
and  force  levels  as  possible  drivers  of  instability  in  susceptible  nations. 
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III.  Methodology 


“This  is  no  formula  of  war.  No  one  dares  to  arrogantly  claim  to  have  the  perfect  method 
in  the  sphere  of  war.  No  one  has  ever  been  able  to  use  one  method  to  win  all  wars.  But  it 
does  not  mean  that  there  are  no  rules  regarding  war.  ” 

Qiao  Liang,  Unrestricted  Warfare 


3.1  Chapter  Overview 

This  research  examines  the  methods  germane  to  the  prediction  and  forecasting  of 
nation-state  violent  conflict  transitions.  Section  3.2  describes  the  methodology  guiding 
the  development  of  the  conditional  conflict  database,  to  include  variable  selection, 
database  design,  and  data  imputation.  Next  section  3.3  discusses  the  mathematical 
principals  and  development  of  the  conditional  logistic  regression  models  to  include  the 
Synthetic  Minority  Over-sampling  Technique.  Finally,  Section  3.4  describes  the  theory 
and  development  of  the  nation  specific  Markov  models  used  to  determine  the  near-  and 
long-term  conflict  trends  of  the  nation-states  examined  in  this  research. 

3.2  Conditional  Conflict  Database  Development 
Nation-State  Case  Selection 

This  study  examines  the  incidences  of  violent  conflict  for  181  of  the  193  member 
states  of  the  United  Nations,  as  well  as  Palestine,  which  is  referred  throughout  this  study 
as  the  West  Rank  (United  Nations,  2015).  The  182  nations  states  examined  in  this 
research  are  consistent  with  those  surveyed  in  the  Boekestein  model  (Boekestein,  2015). 
The  12  member  states  not  considered  in  this  study  are  Andorra,  Dominica,  Liechtenstein, 
The  Marshall  Islands,  Monaco,  Nauru,  Palau,  Saint  Kitts  and  Nevis,  Saint  Lucia,  Saint 
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Vincent  and  the  Grenadines,  San  Marino,  and  Tuvalu.  These  nations  were  omitted  due 
their  relatively  small  populations,  combined  with  inadequate  or  incomplete  data.  Similar 
to  previous  studies,  disputed  territories  and  regions  such  as  Nagorno-Karabakh,  South 
Ossetia,  Western  Sahara,  Somaliland,  and  Taiwan  are  also  omitted  from  consideration. 

Case  selection,  the  specific  years  of  interest  for  each  nation-state  status 
observation,  was  predicated  on  the  availability  of  adequate  data,  combined  with  the 
requirement  to  capture  sufficient  amounts  of  data  relevant  to  the  current  operational 
environment.  Cases  for  all  182  nation  states  are  drawn  from  the  years  2004  to  2014,  the 
11 -year  period  immediately  following  the  United  States  led  invasion  of  Iraq  in  March 
2003;  a  total  of  2,002  individual  nation-year  cases.  This  time  period  was  ultimately 
selected  based  on  the  findings  of  several  recent  operational  environment  assessments  that 
emphasized  the  importance  of  current  visible  trends  for  meaningful  conflict  prediction 
(Johnson,  2014). 

Description  of  the  Dependent  Variable 

This  study  utilizes  the  conditional  dependent  variable  “Conflict  Transition  given 
Previous  Year  Status”,  which  is  derived  from  the  Heidelberg  Institute  for  International 
Conflict  Research  (HIIK)  conflict  intensity  levels  for  each  nation.  To  understand  how  the 
HIIK  derives  a  nation’s  conflict  intensity  score,  one  must  first  define  a  set  of  conflict 
measures  and  conflict  items  which  constitute  the  key  elements  of  the  score.  The  HIIK 
definitions  for  Conflict  Measures  and  Conflict  Items  are  provided. 

Conflict  Measures 

Conflict  measures  are  actions  and  communications  carried  out  by 
a  conflict  actor  in  the  context  of  a  political  conflict.  They  are  constitutive 
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for  an  identifiable  conflict  if  they  lie  outside  established  procedures  of 
conflict  regulations  and  -  possibly  in  conjunction  with  other  conflict 
measures  -  if  they  threaten  the  international  order  or  a  core  function  of 
the  state.  Core  state  functions  encompass  providing  security  of  a 
population,  integrity  of  territory  and  of  a  specific  political,  socioeconomic 
or  cultured  order  (Heidelberg  Institute  for  International  Conflict  Research, 

2014). 

Conflict  Items 

Conflict  items  are  material  or  immaterial  goods  pursued  by 
conflict  actors  via  conflict  measures.  Due  to  the  character  of  conflict 
measures,  conflict  items  attain  relevance  for  the  society  as  a  whole  - 
either  for  coexistence  within  a  given  state  or  between  states.  This  aspect 
constitutes  the  genuinely  political  dimension  of  political  conflicts 
(Heidelberg  Institute  for  International  Conflict  Research,  2014). 

The  2014  Conflict  Barometer  developed  by  the  HIIK  utilizes  the  10  conflict  items 
described  in  Table  3. 
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Table  3:  Conflict  Items 

(Heidelberg  Institute  for  International  Conflict  Research,  2014) 


The  Heidelberg  Institute  for  International  Conflict  Research  Conflict 

Items 

Conflcit  Items 

Description 

System  /  Ideology 

Conflict  actor  aspires  to  change  the  ideological,  religious,  socioeconomic,  or  judicial  orientation  of 
the  political  system  or  regime. 

National  Power 

The  power  to  govern  a  state. 

Autonomy 

Attaining  or  extending  political  self-rule  of  a  population  within  a  state  of  a  dependent  territory 
without  striving  for  independence. 

Secession 

The  aspired  separation  of  a  part  of  a  territory  aiming  to  establish  a  new  state  or  to  merge  with 
another  state. 

Decolonization 

The  desired  independence  of  a  dependent  territory  from  foreign  rule. 

Subnational 

Predominance 

The  attainment  of  de-facto  control  by  a  government,  a  non-state  organization,  or  a  population  over 
a  territory  or  a  population. 

Resources 

The  pursuit  of  the  possession  of  natural  resources  or  raw  materials,  or  the  profits  gained  thereof. 

Territory 

The  desire  to  change  the  course  or  alter  an  international  border. 

International 

Power 

The  change  aspired  in  the  power  constellation  in  the  international  system  or  regional  system 
therein,  especially  by  changing  military  capabilities  or  the  political  or  economic  influence  of  a  state. 

Other  Items 

A  residual  category. 

To  determine  a  conflict’s  intensity  level,  the  HIIK  utilizes  five  proxy  measures  to 
assess  the  means  and  consequences  of  the  given  conflict.  The  means  of  conflict  include 
the  weapons  and  personnel  involved  therein,  while  the  conflict  consequences  includes  the 
casualties,  refugees  and  internally  displaced  persons  (IDP),  and  destruction  sustained  by 
said  conflict  (Heidelberg  Institute  for  International  Conflict  Research,  2014).  The 
parameters  and  assigned  values  are  provided  in  Table  4. 
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Table  4:  HIIK  Proxy  Measures 


Conflcit  Means 

0  Points 

1  Point 

2  Points 

Conflict 

Consequences 

0  Point 

Violent 

Crisis 

Violent 

Crisis 

Limited 

War 

1  Point 

Violent 

Crisis 

Limited 

War 

War 

2  Fonts 

Limited 

War 

War 

War 

Personnel 

Low 

Medium 

High 

Pax  <  50 

50  <  Pax  <  400 

Pax  > 400 

0  Points 

1  Point 

2  Points 

Destruction 

Low 

Medium 

High 

Within  0 

Within  1  -  2 

Within  3-4 

Dimensions 

Dimensions 

Dimensions 

0  Points 

1  Point 

2  Points 

Refugees  &  IDPs 

Low 

Medium 

High 

Ref  < 1000 

1 000  <  Ref  <20,000 

Ref  >20,000 

0  Points 

1  Point 

2  Points 

Casulaties 

Low 

Medium 

High 

Cas  <  20 

20  <  Cas  <  60 

Cas  >  60 

0  Points 

1  Point 

2  Points 

The  intensity  levels  of  a  particular  conflict  are  an  attribute  sum  of  the  conflict 
measures  for  a  given  geographic  area  and  time  period.  The  HIIK  employs  a  six-level 
model  with  the  following  intensity  levels:  0  -  No  Conflict,  1  -  Dispute,  2  -  Non-violent 
Crisis,  3  -  Violent  Crisis,  4  -  Limited  War,  5  -  War  (Heidelberg  Institute  for 
International  Conflict  Research,  2014).  Nations  that  were  or  are  currently  experiencing 
multiple  conflicts  are  assigned  an  overall  HIIK  intensity  level  equating  to  the  highest 
level  assigned  to  any  of  the  ongoing  conflicts  for  a  particular  year.  These  levels  were 
subsequently  mapped  to  a  binary  variable  called  “Level  of  Violence”,  with  levels  0 
through  2  mapped  to  “Non-Violent  Conflicts”  and  levels  3  through  5  mapped  to  “Violent 
Conflicts”  (Boekestein,  2015). 

The  conditional  dependent  variable  “Conflict  Transition  given  Previous  Year 
Status”  is  mapped  to  the  level  of  violence  variable  for  the  preceding  year  (y  —  1)  and  the 
level  of  violence  variable  for  the  following  year.  A  transition  is  said  to  occur  if  the  status 
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changes  over  the  course  of  the  year.  The  mapping  of  the  second  order  dependent  variable 
from  the  HIIK  conflict  intensity  levels  is  provided  in  Table  5. 


Table  5:  Mapping  of  Conditional  Dependent  Variable 


HIIK 

Intesity 

Level 

HIIK 

Terminology 

Level  of 

Violence 

Status  Year 

(y-i) 

Conflict  Trans tion  year  (y), 
given  status  year  (y  - 1) 

0 

No  Conflict 

Non-Violent 

Conflicts 

Not  in 

Conflict 

Transition  into  Non-conflict  (0) 

1 

Dispute 

2 

Non-violent  Crisis 

Transition  Into  Conflict  (1) 

3 

Violent  Crisis 

Violent 

Conflicts 

In  Conflict 

Transition  into  Non-conflict  (0) 

4 

Limited  War 

5 

War 

Transition  Into  Conflict  (1) 

Independent  Variable  Selection 

This  study  incorporates  26  nation  specific  statistic  variables  and  four  trend 
variables  obtained  from  six  data  repositories:  the  Heidelberg  Institute  for  International 
Conflict  Research,  The  World  Bank,  Central  Intelligence  Agency  (CIA)  World  Fact 
Book,  Freedom  House,  the  Center  for  Systemic  Peace,  and  the  Food  &  Agriculture 
Organization  of  the  United  Nations  (UN  FAO).  Variable  selection  was  heavily 
influenced  by  the  Center  for  Army  Analysis  FACT  studies,  12  variables  in  common 
(Reed,  2013);  the  CIA  -  Goldstone  study,  three  variables  in  common  (Goldstone,  et  ah, 
2005),  and  the  Boekestein  study,  25  variables  in  common  (Boekestein,  2015).  These  and 
similar  studies  have  repeatedly  demonstrated  the  significance  of  theses  variables  as 
conflict  predictors,  hence  their  consideration  in  this  study.  Recent  studies,  such  as  those 
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conducted  by  Hegre  at  the  University  of  Oslo,  have  expounded  the  necessity  of  including 
statistical  variables  representing  emerging  trends  within  the  operational  environment  such 
as  military  spending,  urbanization  of  populations,  loss  of  natural  sources  of  fresh  water, 
and  burgeoning  youth  populations  that  are  seen  as  future  drives  of  instability  (Hegre  et 
ah,  2011).  Therefore,  the  following  five  variables  are  also  included  for  consideration 
within  this  study  and  are  defined  as: 

Military  expenditure  (Percent  of  central  government  expenditure): 

Military  expenditures  data  from  the  Stockholm  International  Peace  Research 
Institute  (SIPRI)  are  derived  from  the  North  Atlantic  Treaty  organization  (NATO) 
definition,  which  includes  all  current  and  capital  expenditures  on  the  armed  forces, 
including  peacekeeping  forces;  defense  ministries  and  other  govermnent  agencies 
engaged  in  defense  projects;  paramilitary  forces,  if  these  are  judged  to  be  trained  and 
equipped  for  military  operations;  and  military  space  activities.  Such  expenditures  include 
military  and  civil  personnel,  including  retirement  pensions  of  military  personnel  and 
social  services  for  personnel;  operation  and  maintenance;  procurement;  military  research 
and  development;  and  military  aid  (World  Bank,  2015). 

Military  expenditure  (Percent  of  gross  domestic  product): 

This  variable  is  defined  in  the  same  fashion  as  above,  but  takes  into  account  the 
relative  defense  expenditure  as  it  relates  to  the  total  national  output. 

Population  ages  0-14  (percent  of  total): 

This  variable  is  based  on  a  nation’s  population  between  the  ages  0  to  14  as  a 
percentage  of  the  total  population.  Population  is  based  on  the  de  facto  definition  of 
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population  (World  Bank,  2015).  This  variable  is  referred  to  as  “Youth  Bulge”  throughout 
this  study. 

Renewable  internal  freshwater  resources  per  capita  (cubic  meters): 

Renewable  internal  freshwater  resources  flows  refer  to  internal  renewable 
resources  (internal  river  flows  and  groundwater  from  rainfall)  in  the  country.  Renewable 
internal  freshwater  resources  per  capita  are  calculated  using  the  World  Bank's  population 
estimates  (World  Bank,  2015).  Due  to  the  limitations  of  this  data  set,  this  variable  is  the 
average  of  the  2007,  2012,  and  2013  statistics  for  each  nation,  and  it  is  subsequently  fixed 
as  a  stationary  variable. 

Government  Type: 

This  is  a  six-level  indicator  variable  derived  from  the  Polity  IV  scores  for  each 
nation.  Polity  is  defined  as  a  political  or  governmental  organization;  a  society  or 
institution  with  an  organized  government  (Marshall,  Gurr,  &  Jaggers,  2014).  Polity  IV 
scores  a  nations  political  body  on  a  21  point  scale  of  -10  (fully  autocratic)  to  10  (fully 
democratic),  with  additional  identifiers  -66  (indicating  foreign  interruption),  -77 
(indicating  anarchy),  and  -88  (indicating  a  transitional  government).  From  these  scores 
the  six  levels  are  defined  as:  Level  0:  Autocratic  Government  (Polity  IV:  -10  to  -6);  Level 
1:  Emerging  Democratic  Government  (Polity  IV:  -5  to  +5);  Level  2:  Democratic 
Government  (+6  to  +10);  Level  3:  Foreign  Interruption  (Polity  IV:  -66);  Level  4: 
Anarchy  (Polity  IV:  -77);  Level  5:  Transitional  Government  (Polity  IV:  -88).  This 
variable  was  included  to  provide  greater  fidelity  when  modeling  political  instability 
within  a  nation.  The  Center  for  Systemic  Peace  provides  polity  scores  for  166  of  the  182 
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nations  considered  in  this  study.  To  account  for  this  data  gap,  government  type  was 
correlated  to  the  “Regime  Type”  variable  that  is  discussed  later  in  this  chapter. 

Overview  of  Independent  Variables 

Table  6  provides  a  synopsis  of  the  30  statistical  and  trend  variables.  Several  near¬ 
year  data  sets  (for  years  2012,  2013,  and  2014)  are  missing.  These  occurrences  result  in  a 
“data  lag”  ranging  between  1  and  2  years,  based  upon  the  year  2014  (forecast  year  0),  for 
14  of  the  30  variables.  In  cases  involving  variables  with  a  data  lag,  the  variable  i  at  year  j 
will  be  used  to  predict  conflict  at  year  j  +  lag(i).  For  example,  the  variable  arable  land 
has  two-year  lag  in  the  data  set  requiring  that  the  2012  data  model  2014  conflicts.  There 
are  two  serious  implications  when  constructing  a  predictive  model  using  “lagged”  data. 
The  first  implication  is  that  we  are  attempting  to  develop  a  predictive  tool  using  less 
current  data  that  may  not  capture  or  completely  disregards  current  trends  that  lead  to 
conflict  transitions,  thus  reducing  the  accuracy  of  the  model.  The  second  implication  is 
that  such  data  ultimately  increases  the  overall  variance  in  the  model  due  to  increased 
forecasting  time  horizons.  In  addition  to  the  data  lag,  incomplete  data  sets  (i.e.,  variable- 
year  instances  with  less  than  182  entries)  are  also  pervasive.  Imputation  methods 
employed  to  replace  missing  data  are  covered  later  in  this  chapter. 
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Table  6:  Independent  Variables 


Year  of  First 

Data  Set 

Lag  (years) 

Variable 

Number  of  Entries  per  Year 

2012  2013  2014 

World  Bank  Variable 

1961 

2 

Arable  Land  (hectares  per  person) 

181 

1961 

1 

Birth  Rate  (per  1,000  people) 

182 

182 

1961 

1 

Death  Rate  (per  1,000  people) 

182 

182 

1961 

1 

Fertility  Rate  (births  per  woman) 

182 

182 

1960 

0 

GDP  Per  Capita  (current  USD) 

179 

179 

164 

1990 

0 

Improved  Water  Source  (%  population  with  access) 

178 

175 

175 

1960 

1 

Life  Expectancy  (years) 

182 

182 

1990 

2 

Military  Expend  (%  Gov  Spending) 

100 

1988 

0 

Military  Expend  (%  GDP) 

179 

141 

131 

1961 

0 

Infant  Mortality  rate  (per  1,000  live  births) 

182 

182 

182 

1961 

0 

Population  ages  0  - 14  (%  of  total  population) 

182 

182 

182 

1961 

0 

Population  density  (people  per  square  kilometer) 

181 

181 

181 

1961 

0 

Population  Growth  (annual  %) 

181 

182 

182 

1990 

1 

Refugee  Population  by  county  of  asylum  (%  population) 

159 

160 

1990 

1 

Refugee  population  by  country  of  origin  (%  population) 

180 

181 

1962 

Locked 

Renewable  Fresh  Water  per  Capita  (cubic  meters,  average  of  2004  -  2014  data) 

174 

174 

174 

1960 

0 

Trade  (%  GDP) 

168 

161 

130 

1991 

1 

Unemployment  (total  %  of  labor  force) 

171 

171 

CIA  World  Fact  Book  Variables 

2010 

0 

Border  Conflict  Score 

182 

182 

182 

Locked 

Regime  Type  (3  level  indicator  variable) 

182 

182 

182 

Locked 

Ethnic  Diversity  (%  of  Dominant  Ethnic  Group) 

182 

182 

182 

Locked 

Religious  Diversity  (%  of  Dominant  Ethnic  Group) 

174 

174 

174 

Freedom  House,  The  Center  for  Systemic  peace,  and  Food  &  Agriculture  Organization  of  the  United  Nations  Variables 

1972 

0 

Freedom  Score  (Average  of  Civil  Liberties  and  Political  Rights  (scores  0  to  1)) 

180 

180 

180 

1960 

0 

Polity  IV  (Political  behavior  score  -10  to  10,  and  -66,  -77,  -88) 

166 

166 

166 

1960 

0 

Government  Type  (6  level  indicator  variable  derived  directly  from  Polity  IV  scores) 

166 

166 

166 

1961 

1 

Caloric  Intake  (average  caloric  intake  from  all  sources  per  person) 

39 

39 

Trend  Variable 

1996 

1 

2  Yr  Conflict  Intensity  Trend  (Derived  from  HIIK  intensity  levels) 

182 

182 

182 

1 

2  Yr  Freedom  Trend  (Derived  from  Freedom  Score) 

179 

180 

180 

1 

3  Yr  Freedom  Trend  (Derived  from  Freedom  Score) 

179 

180 

180 

1 

5  Yr  Freedom  Trend  (Derived  from  Freedom  Score) 

180 

181 

181 

A  majority  of  the  independent  variables  are  self-explanatory  in  both  origin  and 
function;  however,  the  derivation  of  several  key  statistical  and  trend  variables  requires 
further  discussion. 

Border  Conflict  Score 

Conflicts  in  bordering  states  are  cited  as  a  variable  of  interest  in  both  the  CIA- 
Goldstone  (as  a  binary  indicator  variable)  and  the  Boekestein  studies.  The  developed 
border  conflict  score  seeks  to  model  the  external  pressures  applied  to  a  nation  as  a 
function  of  HIIK  intensity  level  of  a  nation’s  bordering  neighbors  for  a  given  year,  and 
the  relative  proportion  of  the  international  border  attributed  to  each  of  those  nations.  The 
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international  border  data  was  obtained  from  the  CIA  World  Fact  Book.  The  equation  for 
calculating  the  Border  Conflict  Score  is  defined  in  Equation  1 . 

n 

Cbij  =  ^xtjpi  for  V j 

i= 1 


Equation  1:  Border  Conflict  Score  (Boekestein,  2015) 


Where: 

Cb;i  =  Conflict  in  border  states  statistic 
n  =  number  of  bordering  nations 
xy  =  HIIK  intensity  level  for  nation  i  for  year  j 
pi  =  percent  of  border  shared  with  nation  i 
i-  Country  e  {1,  2, 182} 
j=  Years  e  {1996,  1997,...,  2014} 

The  border  conflict  score  for  Afghanistan  in  2014  is  provided  as  an  example  of 
the  variable  calculation  in  Table  7. 

Table  7:  Border  Conflict  Score  Example 


Afghanistan  Boder  Conflict  Score  2014 

Bordering  State 

Border  (km) 

P, 

xij 

China 

91 

0.015 

4 

Iran 

921 

0.154 

3 

Pakistan 

2670 

0.446 

5 

Tajikistan 

1357 

0.227 

3 

Turkmenistan 

804 

0.134 

1 

Uzbekistan 

144 

0.024 

2 

Border  Conflict  Score  (Cby) 

3.61 
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Regime  Type 

Regime  Type  is  a  three  level  indicator  variable  that  was  first  cited  in  the  CIA- 
Goldstone  study  as  a  significant  predictor  of  political  instability.  The  Boekestein  study 
was  the  first  to  employ  the  variable  in  its  current  simplified  form  after  mapping  the  57 
government  descriptions  provided  in  the  CIA  World  Fact  Book  to  10  then  subsequently 
three  nominative  variables  as  shown  in  Table  8  (Boekestein,  2015). 

Table  8:  Mapping  of  Regime  Type 

(Boekestein,  2015) 


Expanded  Regime  r 

type 

Class 

Total 

Communist 

4 

Dictatorship 

2 

Military  Junta 

1 

Monarchy 

24 

Theocracy 

2 

Democracy 

39 

Republic 

107 

Transitional  Government 

2 

Disputed 

1 

Gand  Total 

182 

Reduced  Regime  Type 

New  Class 

Total 

Central/Ruling  Party 

36 

Democratic 

137 

Emerging,  Transitional,  recent 
change,  disputed 

9 

Grand  Total 

182 

The  three  levels  of  this  variable  are  mapped  as:  Level  0:  Central  rule  /  ruling 
party,  Level  1:  Emerging,  transitional,  or  disputed',  Level  2:  Democratic  government. 
Unlike  the  Government  Type  indicator  variable,  Regime  Type  is  locked,  meaning  that  it 
cannot  change  from  year  to  year.  Regime  Type  is  correlated  with  the  new  dynamic 
indicator  variable  Government  Type,  which  is  envisioned  as  the  primary  means  for  model 
political  institutions.  The  continued  use  of  Regime  Type  within  this  study  is  as  a 
modeling  alternative  to  the  new  variable. 
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Freedom  Score 


The  statistical  variable  Freedom  Score  was  first  identified  as  a  significant  variable 
during  the  Boekestein  study  which  sought  to  develop  a  variable  that  incorporated  the 
highly  correlated  aspects  of  the  Civil  Liberties  and  Political  Rights  variables  aggregated 
by  the  Freedom  House  data  base  (Boekestein,  2015).  Freedom  house  has  compiled  this 
data  set  since  1972,  and  it  currently  covers  195  nations  and  15  disputed  territories 
(Freedom  House,  2015).  For  2015,  Freedom  House  adopted  a  new  scheme  for  its  two 
variables  which  they  believe  provided  more  nuanced  infonnation  than  the  older  7-point 
scoring  system;  Freedom  House  now  scores  Politiccd  Rights  on  a  40-point  scale,  and 
Civil  Liberties  on  a  60-point  scale  (Freedom  House,  2015). 

As  in  the  Boekestein  study,  this  analysis  combines  Political  Rights  and  Civil 
Liberties  to  create  the  variable  Freedom  Score  by  taking  the  average  of  the  normalized 
scores  for  each  nation-year  instance.  Scores  were  normalized  to  remove  bias  attributed  to 
having  an  uneven  dual  scoring  system  utilized  by  Freedom  House.  The  derivation  of  the 
Freedom  Score  is  provided  in  Equations  2,  3,  and  4. 


nPrij  = 


Ppj 

40 


Equation  2:  Normalized  Political  Rights 


nClij 


C_k i 

60 


Equation  3:  Normalized  Civil  Liberties 


38 


Equation  4:  Freedom  Score 


Where: 


FStj  =  Freedom  score  for  country  i  in  year  j 

Pr  =  Political  rights  score  for  country  i  in  year  j 

n  Ptv  =  Normalized  political  rights  score  for  country  i  in  year  j 

Cij  =  Civil  liberties  score  for  country  i  in  year  j 

nCij  =  Normalized  civil  liberties  score  for  country  i  in  year  j 

i=  Country  e  {1,  2,  182} 

j  =  Years  e  {1996,  1997,...,  2014} 

Conflict  and  Freedom  Trend  Variables 

Trend  variables  seek  to  predict  conflict  transitions  through  modeling  the  change 
in  trajectory  of  a  specific  nation’s  conflict  intensity  levels  and  freedom  scores.  Previous 
conflict  prediction  studies  have  successfully  employed  trend  variables  as  indicators  of 
instability.  Due  to  the  nature  of  their  calculations,  all  trend  variables  experience  a  one 
year  lag  in  the  model. 

Change  in  HIIK  conflict  intensity  is  modeled  as  a  two-year  trend  variable  dividing 
the  change  in  HIIK  intensity  levels  for  the  years  in  question  by  the  number  of  intensity 
levels,  as  shown  in  Equation  5.  The  objective  of  this  variable  is  the  improvement  of 
conflict  transition  forecasting  through  the  forecasting  of  increased  or  decreased  levels  of 
violence. 


2YCITU  = 


HILjj _i  -  HILjj. 2 
6 


Equation  5:  Two  Year  HIIK  Trend  Variable 


Where: 


2 YC1T  j  =  Two  year  conflict  intensity  trend  for  country  i  in  year  j 
HILt  .  =  HIIK  intensity  level  for  country  i  in  year  j 
i  =  Country  e  {1,  2,  182} 

j  =  Years  e  (1996,  1997,...,  2014} 

Like  the  HIIK  conflict  intensity  trend  variable,  the  two-,  three-,  and  five-year 
freedom  trends  also  seek  to  forecast  conflict  transitions  through  the  modeling  of  a 
nation’s  Polity  functions.  In  addition  to  the  two-year  trend  variable,  three-  and  five-year 
variables  are  also  included  to  improve  forecasting  over  longer  time  horizons  as  shown  in 
Equation  6. 

2 YFT.j=  FSiJ_2  -  FSi  j_l 
3YFTij=  FS.^-FS,^ 

5YFTij=  FSiJS-FSiJV 

Equation  6:  Two-,  Three-,  and  Five-Year  Freedom  Trend  Variables 

Where: 

2YFTi  j  =  Two-year  freedom  trend  for  country  i  in  year  j 
3YFTj  j  =  Three-year  freedom  trend  for  country  i  in  year  j 
517-T  .  =  Five-year  freedom  trend  for  country  i  in  year  j 
FSjj  =  Freedom  score  for  country  i  in  year  j 
i=  Country  e  (1,  2,  ...,  182} 
j=  Years  e  (1996,  1997,...,  2014} 

Database  Design  and  Construction 
Data  Base  Criteria 

The  design  of  the  Conditional  Conflict  Database  (CCD)  facilitates  the  eventual 
construction  of  the  conditional  logistic  regression  and  Markov  models  and  consists  of  two 
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sub-databases:  the  “In  Conflict”  and  “Not  in  Conflict”.  The  “In  Conflict”  database 


includes  all  instances  of  nations  transitioning  from  a  state  of  conflict  (either  remaining  in 
conflict  or  transitioning  out  of  conflict),  while  the  “Not  in  Conflict”  database  includes  all 
instances  of  transitioning  from  a  state  of  non-conflict.  The  CCD  meets  three  design 
criteria  essential  for  the  development  of  studies  using  logistic  regression  and  Markov 
models:  1  -  Common  nomenclature  and  time  frame  across  all  datasets;  2  -  Automated 
raw-data  refreshment;  and  3  -  Easily  searchable/sortable  by  nation,  year-group,  region, 
etc.  The  objective  of  the  database  design  is  the  creation  of  six  region  specific  databases 
which  are  used  to  develop  the  conditional  logistic  regression  models. 

Issues 

The  primary  obstacle  in  the  creation  of  the  master  database  was  the  sorting, 
cataloguing,  and  formatting  of  the  over  30  disparate  databases  that  are  loaded  into  the 
CCD.  Between  all  datasets  there  exist  338  separate  entries  for  nations,  regions,  and 
territories  (NRT),  of  which  only  182  are  considered  in  this  study.  Additionally,  a 
transliteration  system  was  developed  to  ensure  a  common  naming  convention  for  all  338 
NRTs,  in  addition  to  the  unique  catalogue  numbers  ( 1  through  338 )  assigned  to  each 
entity.  A  uniform  database  structure  based  on  that  used  by  the  World  Bank  is  employed 
to  format  the  raw  databases  and  segregate  the  “top”  182  nations-of-interest,  creating  the 
usable  structures  which  are  loaded  into  the  CCD. 

The  master  database  requires  a  total  78,078  separate  entries  to  properly 
catalogue  the  2,002  separate  nation-year  instances  included  in  this  study.  Manual 
database  updates  are  cumbersome,  time-consuming,  and  prone  to  human  error.  To 
overcome  this  obstacle,  a  Microsoft  Office  visual  basic  (VBA)  based  consolidated 
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database  tool  was  developed  to  compile  the  39  separate  identifying-information  and  data 
spreadsheets  into  one  consolidated  fde,  which  is  subsequently  time-stamped  with  the 
most  recent  compile  date.  This  tool  enables  timely  and  error  free  data  updates  of  the 
CCD  for  any  dataset  conforming  to  the  World  Bank  format. 

Data  Imputation 

As  shown  in  Table  6,  the  raw  datasets  employed  for  this  study  had  numerous 
instances  of  missing  data.  Since  the  study  considers  182  of  the  world’s  nations,  data-year 
sets  containing  less  than  182  data  points  require  the  data  imputation  prior  to  final 
consolidation  in  the  CCD.  In  general,  nations  with  fledgling  or  unstable  governments 
lack  the  ability  to  track  and  consolidate  the  large  amounts  of  statistical  data  required  for 
this  study.  For  the  data  considered  in  this  study,  a  total  of  1,602  or  80%  of  the  nation-year 
instances  had  between  28  and  30  of  the  30  possible  variables,  with  the  average  of  28.5 
variables  per  nation-year  instance.  However,  within  the  considered  dataset,  there  exist  32 
nation-year  instances  that  have  less  than  23  of  the  30  possible  variables;  the  complete  list 
provided  in  Table  9. 
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Table  9:  Number  of  Variables  per  Nation-year  Instance;  Worst  Data 


A  total  of  2,903  of  the  62,062  statistical  data  points  required  imputation  prior  to 
final  consolidation  in  the  CCD.  The  JMP  statistical  software  package  was  employed  to 
impute  the  missing  data.  JMP  imputes  missing  data  points  by  analyzing  values  in  other 
columns  and  rows,  developing  an  estimate  of  the  missing  value(s)  (Hinrichs  &  Boiler, 
2010).  Imputed  values  are  expectations  conditioned  on  the  non-missing  values  of  each 
row  in  the  data  set  (SAS  Institute,  2015).  Two  separate  data  imputation  methods,  isolated 
variable  and  holistic  imputation  (using  entire  data  set),  were  conducted  and  compared  to 
identify  the  optimal  variables  to  import  into  the  master  CCD.  The  final  imputation 
method  selection  was  based  on  the  statistical  similarity  (average)  of  the  imputed  data  to 
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the  raw  data  for  nation-year  instances  within  the  same  region.  Additionally,  the 
imputation  of  the  “Polity  IV”  and  “Government  Type”  data  was  based  off  of  the  “Regime 
Type”  variable.  Table  10  provides  the  list  of  variables  requiring  data  imputation  as  well 
as  the  method  of  imputation  employed. 

Table  10:  Variables  Requiring  Data  Imputation 


Variable  Name 

Imputation  Method 

Arable  Land 

Holistic  Imputation 

GDP  Per  Capitia 

Isolated  Variable 

Improved  Water 

Holistic  Imputation 

Military  Expend  (%  Gov  Spending) 

Holistic  Imputation 

Military  Expend  (%  GDP) 

Holistic  Imputation 

Population  density 

Isolated  Variable 

Population  Growth 

Holistic  Imputation 

Refugee  (Asylum) 

Isolated  Variable 

Refugee  (Origin) 

Isolated  Variable 

Fresh  Water  per  Capita 

Holistic  Imputation 

Trade  (%  GDP) 

Holistic  Imputation 

Unemployment 

Holistic  Imputation 

Polity  IV 

Based  off  regime  type 

Government  Type 

Based  off  regime  type 

Caloric  Intake 

Holistic  Imputation 

Freedom  Score 

Holistic  Imputation 

2  Yr  Freedom  Trend 

Holistic  Imputation 

3  Yr  Freedom  Trend 

Holistic  Imputation 

5  Yr  Freedom  Trend 

Holistic  Imputation 

Religious  Diversity 

Holistic  Imputation 

Conditional  Conflict  Database  Structure 

The  “In  Conflict”  and  “Not  in  Conflict”  CCDs  share  a  common  database  structure 
that  includes  the  catalogue  number,  standard  name  and  code,  the  base  year,  transition 
year-pair,  the  year  code,  supporting  HIIK  data,  region,  and  all  statistical  data  from  2004 
to  2014.  The  database  also  provides  summary  statistics  concerning  the  total  instances 
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and  transitions  of  interest  included  within  the  dataset.  An  example  of  the  CCD  structure 
is  provided  in  Figure  3. 


Left  Node:  Conflict  -  Conflict,  Conlict  -  No  Conflict 
Total  Instances:  731 

Total  Transitions  (out  of  conflict): 111 
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Figure  3:  In  Conflict  Database 

Creation  of  the  CCD  requires  that  all  instances  of  conflict  transition,  from  one 
year  to  another,  are  identified  and  catalogued  according  to  whether  a  transition  from  their 
current  state  occurred  for  the  preceding  year.  This  formulation  results  in  the  possibility 
that  data  specific  nation-year  transition  instances  may  be  included  in  both  the  “In 
Conflict”  and  “Not  in  Conflict”  databases. 

Regional  Assignments 

The  practice  of  creating  region-specific  conflict  prediction  models  has  been 
employed  in  several  previous  studies.  These  studies  have  shown  a  relationship  between 
the  duration  and  scope  of  violent  conflict  and  the  significance  of  regional  commonalities 
such  as  the  incidence  of  natural  resources,  physical  geography,  adjacent  border  conflicts, 
and  population  demographics  (Buhag,  2005).  Additionally,  it  has  been  shown  that 
conflict  risk  factors  such  as  poverty,  famine,  and  despotism  tend  to  cluster  in  so-called 
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“Bad  Neighborhoods”,  with  an  observable  cross-border  effect  (Hegre,  Karlsen,  Nygard, 
Strand,  &  Urdal,  2011).  The  regional  assignments  utilized  in  this  study  are  based  on  the 
six-region  world  model  developed  in  the  Boekestein  model,  and  are  comprised  of:  Sub- 
Sahara  Africa,  South  and  East  Asia,  Eastern  Europe  and  Central  Asia,  Arab  and  North 
African  States,  Latin  America,  and  the  Organization  of  Economic  Cooperation  and 
Development  (OECD)  nations  (Boekestein,  2015).  The  number  of  nations  assigned  to  a 
specific  region  ranges  from  17  (Arab  &  North  African  states)  to  49  (Sub-Sahara  Africa). 
The  regional  assignments  for  the  182  nations  considered  in  this  study  are  provided  in 
Appendix  A. 

3.3  Logistic  Regression 

Overview  of  Logistic  Regression  Concepts  and  Theory 

Logistic  regression  was  employed  as  the  regression  method  for  this  study  due  to 
the  binary  response  of  the  conditional  dependent  variable,  where  a  nation  given  its 
current  status  either  “Transitions  /  Remains  in  Conflict”  or  “Transitions  /  Remains  out  of 
Conflict”.  As  in  any  regression  model,  the  goal  of  this  analysis  is  to  construct  the  best 
fitting,  parsimonious,  and  operationally  interpretable  model  to  describe  the  relationship 
between  the  dependent  and  independent  variables  (Hosmer,  Lemeshow,  &  Sturdivant, 
2013).  Linear  regression  is  not  used  since  the  dichotomous  nature  of  the  data  used  in  this 
study  violates  many  of  the  assumptions  required  for  linear  regression  specifically  those  of 
measurement  (dependent  variable  is  continuous  and  unbounded),  homoscedasticity 
(constant  residual  variance  over  regressor  hull),  and  normality  (residuals  are  normally 
distributed)  (Hosmer,  Lemeshow,  &  Sturdivant,  2013).  The  measurement  assumption  is 
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violated  through  the  use  of  the  dichotomous  variable  that  is  constrained  to  0  or  1 .  This  in 
turn  violates  the  normality  assumption  of  the  distribution  of  errors,  which  themselves  can 
only  assume  values  of  0  or  1.  Finally,  the  homoscedasticity  assumption  is  violated  due  to 
non-constant  variance  of  the  error  terms  associated  with  each  instance. 

In  logistic  regression,  the  conditional  mean  of  the  dichotomous  response  is 
bounded  between  0  and  1,  or  simply  0  <  E{Y\x)  <  1.  This  results  in  the  non-constant 
variance  discussed  previously  as  the  response  approaches  0  or  1  producing  the  “S-curve” 
shown  in  Figure  4.  The  curve  itself  resembles  the  plot  of  a  continuous  distribution  of  a 
random  variable,  leading  to  the  use  of  the  logistic  distribution  to  model  the  conditional 
mean  for  a  dichotomous  response  (Hosmer,  Lemeshow,  &  Sturdivant,  2013). 


20  25  30  35  40  45  50  55  60  65  70 

Age  Category  (x) 


Logit  Transformation  g(x)  vs.  Independent  Variable 

3.00 


Age  Category  (x) 


Figure  4:  Plots  of  the  Logit  7i(x)  and  Logit  Transformation  g(x) 
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The  Logit  and  Logit  Transformation 

While  other  continuous  distributions  can  adequately  model  dichotomous  data,  the 
logistic  distribution  provides  superior  mathematical  flexibility  in  conjunction  with 
operationally  meaningful  estimates  of  the  covariate  effects.  For  the  purposes  of  this 
study  the  conditional  mean  will  be  represented  as  7r(x)  =  E(Y |x),  where  the  logit  7r(x) 
represents  the  probability  of  the  response  is  equal  to  1 ,  or  for  the  purposes  of  this  study  a 
“Transition  into  Conflict,”  given  the  covariate(s).  The  specific  form  of  the  logistic  model 
is  given  in  Equation  7. 

ePo+Pi  x+-+pnx  esW 

^  ^  XT  gPo+  Plx^  ^~Pnx  X  T 

Equation  7:  The  General  Logistic  Regression  Model 

The  general  logistic  regression  model  effectively  ensures  that  the  probability 
estimate  of  conflict  transition  is  bounded  between  0  and  1 .  The  error  associated  with  the 
model  assumes  a  binomial  distribution  with  an  expected  value  given  by  E(s\Y  —  1)  = 
1  —  7r(x)  with  a  probability  of 7r(x),  or  E(s\Y  =  0)  =  — 7r(x)  with  a  probability  of  1  — 
7r(x) .  These  properties  of  the  error  tenn  result  in  the  binomial  distribution  with  the 
properties  off;(£|T)  =  0  and  Var(s\Y)  —  7r(x)[l  —  7r(x)]  (Hosmer,  Lemeshow,  & 
Sturdivant,  2013). 

Central  to  the  development  of  this  study’s  logistic  regression  models  is  the 
concept  of  the  logit  transformation  g(x).  The  logit  encompasses  many  of  the  desirable 
properties  of  the  linear  regression  model  such  as  a  continuous,  unbounded  response  that 
is  linear  within  its  parameters  as  shown  in  Figure  4  (Hosmer,  Lemeshow,  &  Sturdivant, 
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2013).  The  logit  transformation  is  calculated  by  taking  the  natural  logarithm  of  the  odds 
ratio:  7r(x)/[l  —  7r(x)]  presented  in  Equation  8. 


g(x)  =  In 


7r(x) 

1  —  7r(x) 


-  Po  +  fa*  +  •"  +  /?nx 


Equation  8:  The  Logit  Transformation 

The  covariate  parameters  fa  are  estimated  through  the  method  of  maximum 
likelihood  which  seeks  to  determine  the  estimates  of  the  covariate  parameters  that  agree 
most  closely  with  the  observed  data  of  the  response  (Hosmer,  Lemeshow,  &  Sturdivant, 
2013).  As  with  the  error  terms,  each  sample  observation  follows  a  binomial  distribution 
with  the  likelihood  function  given  by  Equation  9. 

n 

l{fi)  =  ]~^(xi)yi[l  -  n{x j)]1-yi 

i= 1 

Equation  9:  Likelihood  Function 

Where: 

p  =  (/?0, A, 

^■(x,-)  =  i,h  response  probability 

y,  =  i,h  response  obsetvation 

The  principal  of  maximum  likelihood  simply  seeks  to  maximize  the  expression 
provided  in  Equation  10.  However,  the  use  of  the  Log-likelihood  function  provided  in 
Equation  1 1  provides  a  simpler  means  of  estimating  the  covariate  parameters  (Hosmer, 
Lemeshow,  &  Sturdivant,  2013). 
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n 


L (P)  =  'YJiydn[n(xi')]  +  (1  -  ydln[  1  -  7r(x;)]} 
i= 1 

Equation  10:  Log-likelihood  Function 


Testing  for  Model  and  Coefficient  Significance 

As  in  linear  regression,  the  basic  premise  for  determining  the  significance  of  any 
logistic  regression  model  is  comparing  the  model  containing  the  covariates  of  interest  to 
the  model  without  those  parameters  via  hypothesis  testing.  The  comparison  method  in 
logistic  regression  is  the  likelihood  ratio  test,  which  assumes  a  Chi-square  (y  ) 
distribution.  In  order  to  conduct  the  hypothesis  tests  using  the  likelihood  ratios,  we  must 
calculate  the  deviance  in  the  likelihood  values  of  the  saturated  and  fitted  models.  The 
deviance  ( D )  statistic  is  shown  in  Equation  1 1 . 


D 


n  { 

_2Xri/n 

i= 1  '■ 


n(Xi) 


Vi 


+  (1  -ydln 


1  -  t t(xJ 


1  ~7i 


Equation  11:  Deviance  of  the  Saturated  and  Fitted  Models 


Given  that  the  likelihood  /(/?)  of  the  saturated  model  (i.e.  the  model  containing 
the  entire  set  of  variables)  is  equal  to  1.0,  it  follows  that  the  deviance  is  equal  to  D  — 
—2ln[likelihood  of  the  fitted  model].  It  should  be  noted  that  the  deviance  statistic 
has  the  same  function  in  logistic  regression  as  the  residual  sum-of-squares  (SSE)  does  in 
linear  regression  (Hosmer,  Lemeshow,  &  Sturdivant,  2013). 

To  assess  the  significance  of  the  covariate  in  question,  the  statistic  G,  the  negative 
two  log  ratio  of  the  deviance  statistics,  with  and  without  the  variable  in  question,  is 
calculated.  The  statistic  G  has  the  same  function  in  logistic  regression  as  the  numerator 
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of  the  partial  F-test  does  in  linear  regression  (Hosmer,  Lemeshow,  &  Sturdivant,  2013). 
The  statistic  G,  which  assumes  a  Chi-square  distribution,  can  be  calculated  as  either  the 
ratio  of  likelihoods  between  the  different  models  as  shown  in  Equation  12,  or  as  the 
differences  between  the  deviances  of  the  two  models  as  shown  in  Equation  13. 

^  Likelihood  without  variable' 

[  Likelihood  with  variable  . 

Equation  12:  Likelihood  Ratio  Method 

G  =  D  (Model  without  variable )  —  D  (Model  with  variable ) 

Equation  13:  Difference  in  deviances  method 

Figure  5  presents  a  likelihood  ratio  hypothesis  test  using  JMP  software  output. 
Model  significance  for  a  given  confidence  level  (l-a)%  is  determined  through  a  standard 
hypothesis  test  wherein  the  null  hypothesis  (Ho),  the  intercept  only  model  is  sufficient  is 
tested  against  the  alternate  hypothesis  (Ha),  the  reduced  model  is  equivalent  to  the  full 
model.  The  G  statistic  is  compared  against  the  Chi-Square  test  statistic  %  (i_a,  n),  for  a 
given  confidence  level  and  n  degrees  of  freedom,  the  difference  in  the  number  of 
variables  between  the  two  models.  In  this  example  the  null  hypothesis  is  rejected  if 
G  >  7.815. 
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H0:  The  model  only  containing  the  intercept  is  sufficient 

Ha:  The  model  with  the  additional  variable(s)  has  more  explanatory  power 

Rejec,H°lf:  G>  xLjm  =  7.815 


Figure  5:  Hypothesis  Test  for  Model  Significance 

Variable  significance  for  a  given  confidence  level  (l-a)%  is  determined  through  a 
standard  hypothesis  test  similar  to  that  discussed  previously.  In  this  case,  the  Wald 
statistic  (W),  which  follows  a  Chi-square  distribution  with  one  degree  of  freedom,  is  used 
as  the  test  statistic.  The  null  hypothesis  (Ho),  the  variable  does  not  significantly 
contribute  to  the  model,  is  tested  against  the  alternate  hypothesis  (Hi),  the  variable 
significantly  contributes  to  the  model.  The  W  statistic  is  compared  against  the  Chi- 
Square  test  statistic  x2( i-a,  i),  for  a  given  confidence  level  and  one  degree  of  freedom.  In 
this  test  the  null  hypothesis  is  rejected  if  G  >  xfi-a.i)-  Figure  6  provides  an  example  of 
such  a  test.  In  this  case,  “Refugee  Asylum”  is  identified  as  a  significant  variable,  while 
the  intercept,  “Trade  (%  GDP)”,  and  “Religious  Diversity”  fail  the  test  for  a  0.05  level  of 
significance. 
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H0:  The  variable  does  not  significantly  contribute  to  the  model 
Ha:  The  variable  significantly  contributes  to  the  model 


O11-"'  >*!-*,!  drf 

=  -5.S41 

W 

=  Wald  Test  Statistic 

d  Parameter  Estimates 

/ 

Term 

Estimate 

Std  Error 
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13449431 
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Figure  6:  Hypothesis  Test  for  Covariate  Significance 
Model  Building  Strategies 

The  Purposeful  Selection  of  Covariates  method  is  the  model  building  strategy 
employed  throughout  this  study.  The  strategy  entails  a  seven  step,  iterative  process  that 
individually  analyzes  each  of  the  independent  variables,  fits  and  analyzes  a  preliminary 
effects  model,  assesses  covariate  interaction,  and  assesses  the  fit  and  adequacy  of  the 
main  effects  model  (Hosmer,  Lemeshow,  &  Sturdivant,  2013).  The  methodology  guiding 
the  purposeful  selection  of  covariates  is  provided  in  Table  11. 
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Table  11:  Purposeful  selection  of  Covariates  Methodology 


Step  1 

Univariate  assessment  of  all  candidate  variables 

Candidate  variables  for  the  first  multivariable  model 
are  selected  based  on  the  univariate  test  p- value. 
Include  if  p  <  0.25 

Step  2 

Fit  a  multivariable  model  containing  all 
covariates  identified  in  step  1 

Assess  the  importance  of  each  covariate  using  the  its 
p- value  analyzed  at  traditional  levels.  Eliminate  all 
vaiariables  (one  at  time),  that  do  not  significantly 
contribute  to  the  model. 

Step  3 

Assessment  of  initial  covariate  estimates 

Compare  the  values  of  the  estimated  coefficients  in 
the  reduced  model,  built  during  step  2,  to  the  original 
model  identified  in  step  1.  Identify  any  variable 
whose  Ap  >  20%,  as  this  indicated  one  or  more 
excluded  variables  are  important  in  providing 
adjustment  to  effect  of  the  variable  in  question,  and 
should  be  added  back  into  the  model. 

Step  4 

Add  each  variable  not  selected  in  step  1  to 
model  obtained  at  the  conclusion  of  step  3. 

Variables  are  added  one  at  a  time,  checking  for 
variable  significance  using  the  p- value. 

The  final  model  produced  in  step  4  is  referred  to  as 
the  preliminary  main  effects  model. 

Step  5 

Construct  the  Main  Effects  Model. 

For  each  continuous  variable,  check  the  assumption 
of  logit  linearity  as  a  function  of  the  covariate. 

Step  6 

Check  for  covariate  interaction  within  the  Main 

Effects  Model. 

Create  a  list  of  possible  pairs  of  variables  that  have  a 
realistic  possibility  of  interacting.  This  can  include  the 
various  levels  of  categorical  variables.  Interaction 
terms  are  added  and  tested  one  at  a  time  for 
significance  in  univariable  model.  Significant 
interaction  terms  are  added  to  the  Main  Effects 

Model. 

Step  7 

Assess  model  Adequacy  and  Fit. 

Assess  model  adequacy  using  the  EIosmer-Lemsow 
Goodness  of  Fit  test,  analysis  of  classification  tables, 
and  the  receiver  operating  charactersitic  curve. 

The  first  step  entails  fitting  separate  univariate  logistic  regression  models  for  each 
variable.  The  significance  of  each  variable  is  assessed  based  on  the  standard  Chi-square 
test.  Candidate  variables  for  the  initial  multivariate  model  are  screened  and  selected 
based  on  p-values  less  or  equal  to  0.25.  This  relaxed  selection  criteria  allows  for  the 
inclusion  of  possibly  significant  variables  that  may  not  have  been  included  in  the  model 
otherwise  (Hosmer,  Lemeshow,  &  Sturdivant,  2013). 
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The  second  and  third  steps  involve  the  fitting  of  the  initial  multivariate  model 
containing  all  the  variables  identified  in  the  first  step,  assessing  model  and  covariate 
significance,  followed  by  a  systematic  removal,  one  variable  at  a  time,  and  analysis  of 
variables  based  upon  the  studies  desired  significance  level  (p  =  0.05,  was  the  standard 
significance  level  employed  throughout  this  study).  As  part  of  the  systematic  analysis  of 
variables  in  the  second  and  third  steps,  a  comparison  of  the  coefficient  values  of  the 
variables  remaining  in  the  model,  prior  to  and  following  the  removal  of  a  variable,  was 
conducted.  If  the  change  in  coefficient  value  (A/?j)  for  any  variable  was  greater  than 
±20%,  it  indicated  the  possible  importance  of  the  removed  variable  within  the  model;  the 
variable  is  subsequently  added  back  into  the  model  on  the  next  iteration. 

During  the  fourth  step,  variables  initially  excluded  from  consideration,  are 
systematically  added  back  into  the  model  and  tested  for  significance  creating  the 
preliminary  effects  model  in  the  fourth  step.  In  the  fifth  step,  each  covariate  within  the 
preliminary  effects  model  is  checked  for  logit  linearity.  If  a  covariate  is  found  to  behave 
in  a  nonlinear  fashion,  appropriate  transformations  are  applied  and  tested.  During  the 
sixth  step,  covariate  is  tested  for  significance  within  the  final  model.  Interaction  between 
two  variables  implies  that  the  effect  of  each  variable  is  not  constant  over  the  levels  of  the 
other  variable  (Hosmer,  Lemeshow,  &  Sturdivant,  2013).  Ultimately,  the  final  decision 
to  include  interaction  tenns  in  the  main  effects  model  must  be  based  on  statistical 
significance  of  the  interaction  term,  and  practical  considerations  such  as  whether  the 
interaction  tenn  improves  the  model  and  whether  it  operationally  relevant.  Following 
the  addition  of  significant  interaction  terms  to  the  preliminary  effects  model,  the 
systematic  model  reduction  of  variables  described  in  the  second  step  is  repeated  with  the 
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coefficients  of  the  main  effects  locked.  The  model  constructed  at  the  end  of  the  sixth  step 
is  known  as  the  main  effects  model  (Hosmer,  Lemeshow,  &  Sturdivant,  2013). 

Assessing  Model  Fit  and  Adequacy 

Assessing  the  fit  and  adequacy  of  model  fit  is  the  seventh  and  final  step  of  the 
purposeful  selection  of  covariates  method.  However,  further  discussion  of  the  various 
methods  employed  in  this  steps  warrant  a  separate  section  within  this  chapter.  Three 
methods  are  employed  in  concert  to  provide  a  holistic  assessment  of  the  fit  and  adequacy 
of  the  conditional  logistic  regression  models  in  this  study,  those  methods  were:  The 
Hosmer-Lemeshow  Goodness  of  Fit  Test,  classification  tables,  and  the  area  under  the 
curve  for  model-specific  receiver  operating  characteristic  curves. 

The  Hosmer-Lemeshow  goodness  of  fit  assesses  the  overall  fit  of  probability 
7r(Xj)  based  population  sub-groups,  through  the  use  of  the  Hosmer-Lemeshow  statistic  C. 
Two  grouping  strategies  are  generally  employed;  the  first  is  based  on  the  percentiles  of 
the  estimated  probabilities,  and  the  second  is  based  on  the  actual  fixed  values  of  the  same 
probabilities  (Hosmer,  Lemeshow,  &  Sturdivant,  2013).  In  general  the  population  is 
broken  into  10  sub-groups  (g),  but  more  or  fewer  can  be  used  depending  on  the  data  set. 
The  squared  differences  between  the  expected  and  observed  observations,  for  both 
success  “1”  and  failure  “0”  responses  for  each  sub-group  are  calculated  added.  The 
summation  of  the  sub-group  specific  statistics  is  known  as  the  Hosmer-Lemeshow 
goodness  of  fit  statistic  ( C )  and  is  presented  in  its  entirety  in  Equation  14  (Hosmer, 
Lemeshow,  &  Sturdivant,  2013). 
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Equation  14:  Hosmer-Lemeshow  Test  Statistic 

Where: 

g  =  Number  of  sub-groups 

oljt  =  Number  of  "1"  or  success  observations  within  the  kth  sub-group 

e\k  =  Number  of  "1"  or  expected  successes  within  the  kth  sub-group 

o0k  =  Number  of  "0"  or  failure  observations  within  the  kth  sub-group 

<?o k  =  Number  of  "0"  or  expected  failures  within  the  kth  sub-group 

Kk  =  The  average  estimated  probability  in  the  kth  sub-group 

Like  other  logistic  regression  test  statistics,  C  follows  a  Chi-square  distribution 
with  given  significance  level  ( a )  and  g  —  2  degrees  of  freedom,  where  g  is  the  number  of 
sub  groups  employed  with  within  the  goodness  of  fit  test.  Model  fit  is  also  assessed 
through  a  standard  hypothesis  test  where  the  null  hypothesis  (Ho),  there  is  evidence  of 
model  fit,  is  tested  against  the  alternate  hypothesis  (Hi),  there  is  little  evidence  of  model 
fit.  The  C  statistic  is  compared  against  the  Chi-Square  test  statistic  %  ( i  g_2),  for  a  given 
confidence  level  and  one  degree  of  freedom  (the  difference  in  the  number  of  variables 
between  the  two  models).  In  this  test  the  null  hypothesis  is  accepted  if  xfi-a,  g- 2)  >  C. 
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An  example  of  such  the  Hosmer-Lemeshow  hypothesis  test  for  a  =  0.05  is  shown  in 


Figure  7. 


H0:  Model  appears  fit  data 

Ha:  There  is  little  evidence  of  model  fit 

Decision  Rule:  Reject  if  T.S.  >  C 


c  = 

0.236 

Test  Result 

T.S.  = 

3.841 

Fail  to  Reject:  Model  Apears  to  fit  the  data 

P{T.S.  >  C} 

0.627 

well. 

Figure  7:  Hosmer-Lemeshow  Hypothesis  Test 

Classifications  tables,  or  confusion  matrices,  gauge  the  adequacy  of  logistic 
regression  models  through  the  depiction  of  the  total  number  of  true -positive,  false¬ 
positive,  true-negative,  and  false-negative  responses  as  they  relate  to  the  total  population. 
True-positive  and  true-negative  are  referenced  as  model  sensitivity  and  model  specificity 
respectively.  These  tables  are  the  result  of  cross-classifying  the  dichotomous  response 
variable,  with  the  value  of  the  outcome  variable  n(x{)  (Hosmer,  Lemeshow,  & 
Sturdivant,  2013).  The  cross  classification  is  dependent  on  a  probability  cut-point  which 
assigns  values  that  fall  below  the  cut-point  to  the  “0”  or  failure  response,  and  values 
greater  that  the  cut-point  to  the  “1”  or  success  response;  in  general  the  cut-point  is 
initially  set  at  0.5.  An  example  of  a  standard  classification  table  is  presented  in  Table  12. 


58 


Table  12:  Classification  Table 


Standard  Classification  Table 

Observed 

Transition  to  Conflict 

Remain/Transition  out 

Classified 

=  1 

of  Conflict  =  0 

Total 

Transition  to  Conflict 

=  1 

5 

1 

6 

Remain/Transition  out 

of  Conflict  =  0 

4 

116 

120 

Total 

9 

117 

126 

Med  Cut  Point: 

0.50 

Model  Acuracy: 

0.960 

As  can  be  observed  in  Table  12,  the  model  sensitivity  or  true -positive  (y;  =  I  |Y, 
=  1)  rate  is  given  by  the  five  correctly  predicted  “1”  responses  out  of  a  total  of  nine 
occurrences.  Additionally,  the  model  specificity  or  true-negative  (y,  =  0|Y,  =0)  is  given 
by  the  116  correctly  classified  “0”  responses  out  of  a  total  of  117  occurrences,  given  a 
cut-point  equal  to  0.50.  The  overall  model  accuracy  is  gauged  by  the  overall  proportion 
of  “true”  responses  to  the  total  number  of  observations,  and  is  provided  in  Equation  15. 

Y  True  Positive  Obs  +  Y  True  Negative  Obs 

Model  Accuracy  —  - — — - 

Y  Observations 

Equation  15:  Logistic  Regression  Model  Accuracy 

The  final  method  used  to  gauge  the  overall  adequacy  of  the  logistic  regression 
model  is  the  total  area  under  the  curve  (AUC)  for  receiver  operating  characteristic  (ROC) 
curves.  Unlike  classification  tables  which  depend  on  a  single  cut-point,  ROC  curves 
provide  a  better  and  more  comprehensive  description  of  model  adequacy  over  the  entire 
range  of  model  responses  (Hosmer,  Lemeshow,  &  Sturdivant,  2013).  The  ROC  curve, 
whose  use  originates  from  signal  theory,  provides  a  means  to  measure  the  model’s 
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(receiver)  ability  to  detect  true  responses  (signal)  in  the  presence  of  noise.  The  graph  of 
the  ROC  curve  plots  the  model’s  probability  of  detecting  a  true  signal  (sensitivity)  as  a 
function  of  the  probability  of  detecting  a  false  signal  (1  -  specificity)  over  the  entire 


range  of  cut-point  values  as  shown  in  Figure  8  (Hosmer,  Lemeshow,  &  Sturdivant,  2013). 


Figure  8:  Receiver  Operating  Characteristic  Curve 

The  ROC  area  under  the  curve  ranges  from  0.5  to  1.0  and  provides  a  measure  of 
the  model’s  ability  to  effectively  discriminate  between  observations  experiencing  the 
outcome  of  interest  versus  those  who  do  not  (Hosmer,  Lemeshow,  &  Sturdivant,  2013). 
Model’s  with  low  AUC  values  nearing  0.50,  are  said  to  have  little  to  no  discrimination 
capacity,  or  that  the  model  provides  little  predictive  benefit  over  that  of  a  coin  toss. 
While  there  is  not  set  standard  for  gauging  the  adequacy  of  model  discrimination,  the 
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criteria  provided  in  Table  13  set  the  guidelines  for  logistic  regression  model  analysis 
employed  in  this  study. 


Table  13:  Discrimination  Measures 

(Hosmer,  Lemeshow,  &  Sturdivant,  2013) 


AUC 

General  Guidelines  | 

AUC  =  0.5 

No  Model  Discrimination 

0.5  <  AUC  <0.7 

Poor  Discrimination 

0.7  <  AUC  <0.8 

Acceptable  Discrimination 

0.8  <  AUC  <0.9 

Excellent  Discrimination 

AUC  >0.9 

Superior  Discrimination 

Interpretation  of  the  Logistic  Regression  Model 

The  odds  ratio  can  be  used  to  approximate  another  measure  known  as  the  relative 
risk,  which  is  the  ratio  of  outcome  probabilities  (Hosmer,  Lemeshow,  &  Sturdivant, 
2013).  The  concept  of  relative  risk  can  be  related  to  the  classification  table,  and  model 


sensitivity/specificity  as  shown  in  Figure  9. 


Standard  Classification  Table 

Observed 

Transition  to  Conflict 

Remain/Transition  out 

Classified 

=  1 

of  Conflict  =  0 

Transition  to  Conflict 

=  1 

'  1  +  e^o+^i 

e^o  / 

'  1  +  e^o 

Remain/Transition  out 

(1/  \ 

(1/  ^ 

of  Conflict  =  0 

V  +  ePo+Pi) 

l  /l  +  e/w 

Total 

1 

1 

Figure  9:  Relative  Risk  Relation  to  Classification  Table 
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Interpretation  of  the  effects  of  significant  covariates,  as  they  relate  to  conflict 
transition  is  central  to  the  research  questions  and  operational  relevancy  of  this  study.  For 
this  purpose,  logistic  regression  analysis  employs  the  concept  of  the  odds  ratio  (OR), 
which  is  a  measure  of  association  that  approximates  the  likelihood  for  the  dichotomous 
response  given  a  certain  covariate  remains  in  the  model.  Equation  16  provides  the 
derivation  of  the  univariate  odds  ratio. 


OR 


( e/?0+/?1  /  ^ 

1  '  1  +  e^o+Pi  J 

(  A  +  e^o+Pi) 

_ -  —  p  Pi 

u,  \ 

^  A  +  eP°  J 

(  A  +  eP°) 


Equation  16:  Univariate  Odds  Ratio 


The  basic  interpretation  of  the  odds  ratio  is  illustrated  in  the  following  example. 
If  a  certain  model  has  an  odds  ratio  of  2,  it  can  be  said  that  the  odds  of  experiencing  the 
outcome  of  interest,  given  the  certain  covariate  effect  is  present,  is  2  to  1 .  Conversely  if 
the  odds  ratio  is  0.5,  it  can  be  said  that  odds  of  experiencing  the  outcome  of  interest  is 
half  of  that  when  a  certain  covariate  effect  is  present.  The  mechanics  and  interpretation 
of  multivariate  model  odds  ratios  is  very  similar  to  that  of  the  univariate  method,  which  is 
presented  to  demonstrate  the  basic  premises  of  the  concept. 
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Overview  of  JMP  Software  and  Output 

JMP  is  a  statistical  software  package  developed  by  the  SAS  Corporation  that  is 
employed  to  construct  and  analyze  the  logistic  regression  models  for  this  study.  For  this 
reason,  a  brief  discussion  of  the  JMP  model  output  is  warranted.  Figure  10  displays  the 
JMP  whole  model  test  for  significance.  The  JMP  output  displays  the  log-likelihood 
values  for  both  the  Full  and  Reduced  models,  as  well  as  the  Chi-Square  distributed  G 
statistic.  This  interface  enables  the  analyst  to  quickly  ascertain  the  significance  of  the 
overall  model.  This  is  done  through  visual  inspection  of  the  p-value  given  as 
“Prob>ChiSq”  in  the  JMP  interface.  For  the  purposes  of  this  study  the  threshold  for 
model  significance  was  set  at  for  p-values  <  0.05;  the  JMP  default  threshold  of 
significance  indicated  by  (Hinrichs  &  Boiler,  2010). 


Figure  10:  JMP  Whole  Model  Test 

The  JMP  environment  also  provides  estimates  of  coefficients  combined  with  the 
overall  significance  of  the  variables  included  in  the  model.  As  with  the  whole  model  test, 
the  threshold  for  covariate  significance  was  set  at  p-values  <  0.05.  The  JMP  estimates  of 
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the  covariate  coefficients  (/?;)  are  provided  in  the  first  column  of  Figure  11.  It  should  be 
observed  that  these  estimates  have  the  opposite  sign  when  included  in  the  final  model. 
For  example,  the  logit  transformation  for  the  Figure  11  estimates  is  given  as:  g{x)  — 
25.247  -  2.12  x  10~5x1  -  0.087x2  -  21.558x3. 


Parameter  Estimates 

Term 

Estimate 

Std  Error 

Ch  [Square 

ProtpChiSq 

Intercept 

-25.24666 

13.925276 

3.29 

0.0698 

Refugee  (Asylum) 

2.1 3222  e-5 

1.0112e-5 

4.45 

0.0350* 

Trade  [%  GDP} 

0.08733891 

0.0489827 

3.18 

0.0746 

Religious  Diversity 

21.5581165 

13.449431 

2.57 

0.1090 

Figure  11:  JMP  Parameter  Estimates 

Synthetic  Minority  Over-sampling  Technique 

The  nature  of  predicting  conflict  transitions,  which  are  decidedly  rare  events, 
results  in  significantly  unbalanced  conditional  conflict  data  sets;  a  data  set  is  said  to  be 
unbalanced  if  the  classification  categories  are  not  approximately  equally  represented 
(Chawla  et  ah,  2002).  Due  to  the  method  of  maximum  likelihood,  which  is  employed  to 
estimate  the  covariate  coefficients,  the  imbalance  of  the  data  set  will  favor  the 
observation  response,  success  or  failure  that  forms  the  majority  of  the  population 
responses.  This  results  in  the  tendency  to  misclassify  the  observations  of  interests,  i.e., 
conflict  transitions,  in  favor  of  the  majority  response,  no  transition  from  current  status. 
To  compensate  for  this  phenomenon,  the  Synthetic  Minority  Over-sampling  Technique 
(SMOTE)  was  utilized  to  enable  development  of  the  conditional  logistic  regression 
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model  for  a  single  specific  region,  which  experienced  significant  issues  with 
misclassification. 

The  SMOTE  methodology  conducts  an  over-sampling  of  the  minority  class 
through  the  creation  of  “synthetic”  observations  (Chawla  et  ah,  2002).  The  generation  of 
synthetic  observations  is  conducted  in  the  feature  space  of  the  observations,  through  the 
creation  of  segments  joining  the  k  (in  this  study  k  =  5),  nearest  neighbors.  The  synthetic 
examples  added  to  the  original  data  set  result  in  the  creation  of  larger  and  less  specific 
data  regions,  which  allows  better  training  of  the  minority  dataset  (Chawla  et  ah,  2002). 
Synthetic  data  points  utilized  in  this  study  were  generated  using  a  MATLAB  sub-routine 
based  on  the  pseudo-code  provided  in  Figure  12. 
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Algorithm  SMOTE( T,  N,  k) 

Input:  Number  of  minority  class  samples  T;  Amount  of  SMOTE  TV%;  Number  of  nearest 

neighbors  k 

Output:  (TV/100)  *  T  synthetic  minority  class  samples 

1. 

(*  If  TV  is  less  than  1 00%,  randomize  the  minority  class  samples  as  only  a  random 

percent  of  them  will  be  SMOTEd.  *) 

2. 

if  TV  <  100 

3. 

then  Randomize  the  T  minority  class  samples 

4. 

T  =  (TV/100)  .  T 

5. 

TV  =  100 

G. 

endif 

7. 

TV  =  (mt)(TV/100)  (*  The  amount  of  SMOTE  is  assumed  to  be  in  integral  multiples 

too.  *) 

of 

8. 

k  =  Number  of  nearest  neighbors 

9. 

numattrs  =  Number  of  attributes 

10. 

Sample[  ][  ]:  array  for  original  minority  class  samples 

11. 

newindcx:  keeps  a  count  of  number  of  synthetic  samples  generated,  initialized  to  0 

12. 

Synthetic[  ][  ]:  array  for  synthetic  samples 

(»  Compute  k  nearest  neighbors  for  each  minority  class  sample  only.  ») 

13. 

for  i  <—  1  to  T 

14. 

Compute  k  nearest  neighbors  for  i,  and  save  the  indices  in  the  nnarray 

15. 

Populate(TV,  »,  nnarray) 

1G. 

end  for 

Populate(N ,  i,  nnarray)  (*  Function  to  generate  the  synthetic  samples.  *) 

17. 

while  TV  ^  0 

18. 

Choose  a  random  number  between  1  and  k,  call  it  nn.  This  step  chooses  one 
the  k  nearest  neighbors  of  i. 

of 

19. 

for  attr  <—  1  to  numattrs 

20. 

Compute:  dif  =  Sarnple\nnarray[nn\)\attr\  —  Sample [i] [off r] 

21. 

Compute:  gap  =  random  number  between  0  and  1 

22. 

Synthetic[newindex\\attr\  —  Sample[i][aftr]  +  gap  *  dif 

23. 

endfor 

24. 

ncurindcx ++ 

25. 

TV  =  TV  —  1 

20. 

endwhile 

27. 

return  (»  End  of  Populate.  *) 

End  of  Pseudo-Code. 

Figure  12:  SMOTE  Pseudo-code 

(Chawla  et  al,  2002) 

Construction  of  Regional  Logistic  Regression  Models 
Model  Dataset  Overview 

Six  regional  logistic  regression  models,  consisting  of  two  sub-models, 
conditioned  on  a  nation’s  conflict  status  prior  of  the  year  of  transition  were  developed  for 
this  study.  The  use  of  “In  Conflict”  and  “Not  in  Conflict”  conditional  models  is  a 
requirement  for  the  subsequent  development  of  the  nation  specific  Markov  conflict 
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transition  models.  The  “In  Conflict”  models  include  all  instances  of  nations  in  conflict 


for  year  i  —  1,  that  either  remain  in  conflict  or  transition  out  of  conflict  in  year  i. 
Similarly,  the  “Not  in  Conflict”  models  include  all  instances  of  nations  not  in  conflict  for 
year  i  —  1,  that  remain  out  of  conflict  or  transition  into  conflict  in  year  i.  The  data  set 
utilized  for  the  training  and  validation  models  covered  the  years  2004  to  2013;  data  for 
year  2014  was  reserved  for  Markov  model  development  due  to  the  lack  of  HIIK  conflict 
data  for  year  2015.  The  summary  statistics  for  the  regional  model  data  is  provided  in 
Table  14. 

Table  14:  Summary  Statistics  of  Regional  Model  Data 


Regional  Models 

Sub-Saharan  Africa 

South  and  East  Asia 

Eastern  Europe  and 
Central  Asia 

Arab  &  North  African 

States 

Statisitcs 

In  Conflict 

Not  In  Conflict 

In  Conflict 

Not  In  Conflict 

In  Conflict 

Not  In  Conflict 

In  Conflict  Not  In  Conflict 

Number  of  Cases 

228 

262 

123 

157 

117 

166 

95 

75 

N  umbe  r  of  T  rans  itions 

37 

42 

19 

19 

19 

23 

6 

14 

Transition  rate  (%) 

16.2% 

16.0% 

15.4% 

12.1% 

16.2% 

13.9% 

6.3% 

18.7% 

Regional  Models 

Latin  Ame  rica 

OECD 

World  View  (Totals  of 
Regions) 

Statisitcs 

In  Conflict  Not  In  Conflict 

In  Conflict  Not  In  Conflict 

In  Conflict  Not  In  Conflict 

Number  of  Cases 

N umbe  r  of  T rans  itions 
Transition  rate  (%) 

95  174 

19  26 

20.0%  14.9% 

75  255 

11  13 

14.7%  5.1% 

733  1089 

111  137 

15.1%  12.6% 

On  average,  the  “In  Conflict”  models  experience  transitions  of  interest  (i.e. 
transitions  out  of  conflict)  in  15.1%  of  all  cases,  while  the  “Not  in  Conflict”  models 
experience  transitions  into  conflict  in  12.6%  of  all  cases.  These  average  transition  rates 
were  instrumental  in  the  identification  of  the  training  and  validation  data  sets,  which 
sought  to  maintain  these  rates  for  model  development. 

Design  of  Training  and  Validation  Data  Sets 

The  overarching  concept  guiding  the  selection  of  the  training  and  validation  data 
sets  was  to  identify  the  data  subsets,  for  each  conditional  model,  that  provided  transition 
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rates  comparable  to  the  “World  View”  averages  shown  in  Table  14.  In  general,  this 
principal  was  adhered  to  for  every  model  except  the  Arab  and  North  African  “In 
Conflict”  model  and  the  OECD  “Not  in  Conflict”  model  which  experienced  below 
average  out-of-state  transition  rates  of  6.3%  and  5.1%  respectively.  Training  and 
validation  data  sets  were  initially  standardized  across  all  models  with  year  sets  2004  to 
2010  specified  for  the  training  models,  and  year  sets  2011  to  2013  specified  for  model 
validation.  However,  during  the  construction  of  the  twelve  conditional  models,  it  became 
clear  that  a  standardized  year  set  across  regions  resulted  in  the  development  of  sub- 
optimal  conditional  models.  Continuous  analysis  and  model  refinement  the  construction 
of  the  conditional  models  resulted  in  the  selection  of  the  model  specific  data  year  sets 
provided  in  Table  15.  The  final  selection  of  data  year-sets  is  predicated  on  balancing  the 
competing  requirements  of  maintaining  individual  model  transition  rates  on  par  with 
world  averages  and  constructing  models  that  adequately  predict  the  rare  events  of 
interest. 


Table  15:  Nodal  Model  Training  and  Validation  Year  Sets 


Regional  Models 

Sub-Saharan  Africa 

South  and  East  Asia 

Eastern  Europe  and 
Central  Asia 

Year  Sets 

In  Conflict 

Not  In  Conflict 

In  Conflict 

Not  In  Conflict 

In  Conflict 

Not  In  Conflict 

Training  Year  Set 

2004-2010 

2004-2010 

2004-2010 

2008-2011 

2006-2011 

2004-2010 

Validation  Year  Set 

2011-2013 

2011-2013 

2011-2013 

2012-2013 

2012-2013 

2011-2013 

Markov  Year  Set 

2014 

2014 

2014 

2014 

2014 

2014 

Regional  Models 

Arab  &  North  African 

States 

Latin  America 

OECD 

Year  Sets 

In  Conflict 

Not  In  Conflict 

In  Conflict 

Not  In  Conflict 

In  Conflict 

Not  In  Conflict 

Training  Year  Set 

2004-2010 

2004-2009 

2008-2011 

2004-2010 

2004-2010 

2005-2009 

Validation  Year  Set 

2011-2013 

2010-2013 

2012-2013 

2011-2013 

2011-2013 

2010-2013 

Markov  Year  Set 

2014 

2014 

2014 

2014 

2014 

2014 
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Issues  Encountered  with  Rare  Event  Prediction 


As  stated  previously,  the  method  of  maximum  likelihood  estimation  tends  to  favor 
the  majority  of  occurrences  in  unbalanced  data  sets,  resulting  in  the  misclassification  of 
rare  events.  All  twelve  nodal  models  experienced  minor-to-moderate  detrimental  impacts 
to  model  prediction  accuracy  as  a  result  of  this  phenomenon.  The  careful  selection  of 
training  and  validation  year-sets  enabled  the  mitigation  of  misclassification  issues  in  1 1 
of  the  12  nodal  models.  However,  all  initial  Eastern  Europe  &  Central  Asia  “Not  in 
Conflict”  models  failed  to  properly  classify  a  single  conflict  transition  in  any  of  the 
validation  models.  To  correct  this  deficiency,  the  Synthetic  Minority  Over-sampling 
Technique  was  utilized  to  produce  48  additional  minority  instances  (transitions  from  no¬ 
conflict  into  conflict)  for  the  training  model  data  set.  The  additional  data  points  were 
generated  using  a  SMOTE  algorithm  developed  for  the  MATLAB  modeling  environment 
using  the  16  conflict  transitions  from  year  sets  2004  to  2010  as  the  primary  input 
(MathWorks,  2015).  Following  generation,  the  48  instances  were  analyzed  to  ensure 
completeness  and  similarity  to  the  original  variables.  It  was  noted  that  design  variables 
such  as  “Government  Type”  and  “Regime  Type”  were  approximated  as  continuous 
variables  with  values  ranging  from  0.20  to  0.70.  To  correct  this  issue,  values  less  than 
0.50  were  rounded  down  to  0,  where  those  greater  than  or  equal  to  0.50  were  rounded  up 
to  1,  while  ensuring  only  1  level  for  each  variable  assigned  to  a  particular  instance.  Five 
separate  models  were  developed  using  the  SMOTE  training-set,  all  of  which  predicted  at 
least  one  of  the  seven  observed  conflict  transitions  from  year-set  2011-2013.  Discussion 
and  analysis  of  the  final  nodal  models  for  each  region  is  discussed  in  Chapter  IV. 
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3.4  Markov  Models 


Overview 

The  use  of  nation-specific  Markov  models  as  a  forecasting  tool  for  conflict  and 
conflict  transitions  is  envisioned  to  provide  operationally  relevant  and  tractable  analysis 
of  future  conflict  trends.  This  study  utilizes  the  two-state  Markov  Model  depicted  in 
Figure  13,  providing  the  probabilities  of  conflict  transition  for  the  following  year,  given 
the  current  conflict  status  of  the  nation  in  question.  The  Markov  model  base  year  (year  0) 
is  for  this  study  is  affixed  at  2014,  corresponding  to  the  data  provided  in  the  most  recent 
HIIK  conflict  barometer.  Subsequently,  the  transition  probabilities  for  the  Markov  base 
year  are  calculated  using  the  conditional  logistic  regression  models  and  applied  to  all  1 82 
nations  in  the  2014  data  set.  The  use  of  Markov  models  provides  insights  into  expected 
transition  times,  mean  recurrence,  as  well  as  the  long-run  proportions  that  a  nation  will 
remain  in  a  particular  status.  Ultimately  the  use  of  Markov  models  provides 
operationally  relevant  global  conflict  forecasting  with  prediction  horizons  greater  than 
one  year. 
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P10:  Probability  that  State  in 
conflict  transitions  out  of  conflict 


P01:  Probability  that  State  not  in 
conflict  transitions  into  conflict 


Figure  13:  Nation  Specific  Conflict  Transition  Markov  Model 
The  Markov  Process 

A  discrete  time  Markov  process  is  a  stochastic  process  given  {Xn,n  —  1,2, ...,} 
that  takes  on  a  finite  number  of  possible  values,  which  for  the  purposes  of  this  study  will 
include  the  entire  set  of  non-negative  integers.  If  Xn  —  i,  the  process  is  said  to  be  in  state 
i  at  time  n  (Ross,  2014).  Given  that  the  current  system  in  in  state  i  at  time  n,  there  exists 
a  fixed  probability  Ptj  that  the  system  will  transition  to  state  j,  at  time  n  +  1,  as  shown  in 
Equation  17. 

P{.Xn+i  —  j\Xn  —  i,Xn_i  —  in-i,  ■■■,Xi  —  i\,X0  —  i0}  —  P{An+1  —  j \Xn  —  i}  —  Pij 

Equation  17:  Markov  Chain  (Ross,  2014) 
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This  equation  may  be  interpreted  as  the  conditional  probability  of  any  future  state  Xn+1, 
given  past  states  and  the  present  state  Xn,  is  independent  of  all  previous  states,  and  is 
conditioned  only  on  the  current  state  (Ross,  2014).  This  is  known  as  the  one-step 
transition  probability,  and  can  also  be  represented  as:  Ptj  —  P-j  —  Pfj,  which  will  provide 
a  useful  form  of  notation  when  dealing  with  n- step  transition  probabilities  as  shown  in 
Equation  18.  It  follows  then  that  the  sum  of  the  probabilities  of  all  transition  options 

given  a  current  state,  is  equal  to  1. 

00 

^ Pij  =  1  ,vi  =  0, 1,2,... 
j= o 

Equation  18:  Summation  of  Transition  Probabilities  for  a  Current  State 

This  feature  of  the  Markov  state  probabilities  is  illustrated  in  the  generic  two-state 
Markov  chain  ( P )  depicted  in  Equation  19. 

_  a  1  —  a 

1-P  P 

Equation  19:  Generic  Two-state  Markov  Chain 

Generic  Two-state  Markov  Chain  In  this  example  the  probability  of  remaining  in 
state  “0”,  given  you  are  currently  in  “0”  is  given  by  Poo  —  a,  while  the  probability  of 
transitioning  from  state  “0”  to  state  “1”  is  given  by  Poi  —  1  —  a.  Similarly,  the 
probability  of  transitioning  to  state  “0”  from  state  “i”  is  Pio  —  1  —  /?,  and  remaining  in 
state  “i”  is  PX1  —  /?.  For  the  purposes  of  the  study,  the  following  state  transition 
probabilities  are  defined. 


72 


Pm  =  Probability  that  nation  not  in  conflict  remains  out  of  conflict 
Pm  =  Probability  that  nation  not  in  conflict  transitions  into  of  conflict 
PU)  =  Probability  that  nation  in  conflict  transitions  out  of  conflict 
P1 ,  =  Probability  that  nation  in  conflict  remains  in  conflict 

A  brief  discussion  of  the  accessibility  of  states  within  a  Markov  model  is 
warranted  before  we  proceed  further.  State  j  is  said  to  be  accessible  from  state  i  if 
Pfj  >  0  (Ross,  2014).  Additionally,  the  states  of  the  models  employed  in  this  study  are 
said  to  communicate,  since  they  are  always  accessible  from  each  other.  This  is  germane 
to  this  study,  as  all  included  nations  have  the  potential  to  transition  from  one  state  to 
another  (i.e.,  there  exist  no  probabilities  such  that  Pfj  =  0).  However,  as  will  be 
discussed  later,  there  are  numerous  states  that  have  transition  probabilities  approaching  0. 
This  condition  results  in  some  very  interesting  phenomena  when  analyzing  the  stability, 
recurrence  and  long-run  proportions  of  nations  and  is  discussed  in  Chapter  IV. 

Chapman-Kolmogorov  Equations 

The  Chapman-Kolmogorov  equations  provide  a  method  for  computing  the 
probability  that  a  system  currently  in  state  i  will  transition  to  state  j  after  n  additional 
transitions  (Ross,  2014).  The  concept  of  the  /7-step  transition  probability  is  easily 
relatable  to  the  1-step  transition  discussed  previously  and  is  shown  in  Equation  20. 

P?j=P{Xn+k=j\Xk  =  i} 

Equation  20:  Markov  n-step  probability 

Computation  of  the  n-step  transition  probabilities  occurs  via  sum-product  of  the 
transition  probabilities  for  periods  k  and  n,  as  shown  in  Equation  21. 
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oo 


pn+m  _  \  pn  pm 
i j  /  ik  k j 

k= 0 

Equation  21:  Chapman- Kolmogorov  Equation 

The  multiplication  of  transition  probabilities  PilPkj,  represent  the  probability  that, 
starting  in  state  i  the  process  will  transition  to  state  j  in  n  +  m  transitions,  through  a  path 
that  will  move  through  state  k  at  the  nth  transition  (Ross,  2014).  This  concept  is 
subsequently  adapted  to  the  calculation  of  the  /7-step  transition  matrix  probabilities  P(n\ 
through  the  use  of  matrix  multiplication  as  shown  in  Equation  22. 

p(n+m)  _  p{n)  .  p(m) 

Equation  22:  Transition  probabilities  for  the  n-step  matrix 

This  extremely  powerful  concept  of  using  matrix  multiplication  to  simultaneously 
determine  the  transition  probabilities  of  each  state  for  a  given  time  period,  forms  the  basis 
of  the  conflict  forecasting  and  analysis  tool  developed  for  this  study. 

Sojourn  Times  and  Variance 

Relevant  to  the  forecasting  of  conflict  transitions  is  the  expected  time  to  the  first 
conflict  transition,  or  simply  given  that  a  nation  is  currently  in  state  i,  what  is  the 
expected  time  Rj  until  is  it  is  in  state  /?  The  time  to  first  transition,  from  a  designated 
time  0,  is  simply  calculated  by  taking  the  inverse  of  the  probability  given  the  system  is 
currently  in  state  i,  the  system  will  transition  into  state  j.  The  expected  time  to  the  first 
transition  and  its  variance  are  calculated  using  Equation  23. 
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£[«.]=-t;e[«.]=4- 

P  01  Pi  0 


Equation  23:  Expected  Time  to  First  Transition  and  Variance 


Where: 

Rl  =  Time  to  1st  transition  (non-conflict  to  conflcit) 

R0  =  Time  to  1st  transition  (conflict  to  non-conflcit) 

pi j  =  Transition  probability  at  time  zero  (non-conflict  to  conflcit) 

p°w  =  Transition  probability  at  time  zero  (conflict  to  non-conflcit) 

It  should  be  remembered  that,  due  to  the  memoryless  properties  of  the  Markov 
model,  time  0  is  relative  and  can  be  designated  at  any  point  in  time. 

Recurrence  and  Long-Run  Proportions 

As  stated  previously,  the  Markov  models  employed  in  this  study  have  states  that 
are  accessible  from  every  other  state;  creating  a  condition  known  as  positive  recurrence. 
A  state  j  is  said  to  be  positive  recurrent  if  the  number  of  expected  transitions  it  takes  to 
start  and  then  return  to  state  j  is  less  than  infinity  (i.e.,  rrij  <  oo)  (Ross,  2014).  The  mean 
recurrence  for  any  state  is  given  by  Equation  24. 


1 


Equation  24:  Mean  recurrence  time  for  state  j 


Where: 
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7i  j  =  Long  run  proportion  of  time  spent  in  state  j 

The  concept  of  recurrence  leads  directly  to  the  idea  of  Markov  model  long  run 
proportions  nj,  the  expected  percentage  of  time  a  system  will  be  in  state  j.  The  long  run 
proportions  of  a  Markov  chain  are  closely  associated  with  the  eigenvalues  of  the 
transition  matrix  P  (Ross,  2014).  Additionally,  like  the  state  specific  transition 
probabilities  prh  the  long  run  proportions  must  also  sum  to  1 .  The  derivation  of  the  two- 
state  long  run  proportions  is  provided  in  Equation  25. 

^0  —  Poo  '  K0  +  Pio  ■ 

—  Pox  '  ^0  Pi  1  ' 


7Tq  +  TTj  —  1 


7Tq 


Pi  0 


1  ft \ 


Pol 


l  +  Pio~Poo  1  +  Pio~Poo 

Equation  25:  Two-State  Long  Run  Proportions 


Where: 


7Tq  =  Long  run  proportion  of  time  spent  not  in  conflict 
nx  ~  Long  run  proportion  of  time  spent  in  conflict 

It  should  be  noted,  that  the  long  run  proportions  Uj  can  be  approximated  by 
raising  the  transition  probability  matrix  P,  to  a  significantly  high  power,  as  demonstrated 
in  the  Equation  26. 


1°°  l~>n 


7t0 

Ki 

7T0 

Kx 

Pn  = 

where 
n  »  50 

n  =  number  of  periods  into  the  future 


Equation  26:  Long  Run  Proportion  Approximation 
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Development  of  the  Nation  Specific  Markov  Models 


The  construction  of  the  two-state  transition  probability  matrix  required  that  the 
region-specific  conditional  models  calculate  the  transition  probabilities  for  the  year  2014 
data  set.  The  process  involved  applying  both  conditional  models,  for  a  specific  region,  to 
the  same  data  set,  resulting  in  four  distinct  transition  probabilities  for  each  of  the  182 
nations  considered.  Specifically,  the  transition  probabilities  p00  and  p01  were  calculated 
from  the  “Not  in  Conflict”  models,  while  p10  and  plx  were  calculated  from  the  “In 
Conflict”  models.  These  probabilities  are  subsequently  compiled  into  a  VBA  enabled, 
Microsoft  Excel  workbook  known  as  the  “Conflict  Transition  Probability  Markov  Chain 
Tool.”  The  tool  enables  the  automated  calculation  of  the  n-step  transition  probabilities 
for  a  specified  time -period,  first  and  second  sojourn  times  and  variances,  the  mean 
recurrence  times,  as  well  as  the  long-run  proportions  for  each  state.  An  example  of  the 
tool  is  shown  in  Figure  14. 


Conflict  Tranistion  Probability  Markov  Chain  Tool 


Initialize  Markov 
Chains 


Run  Markov 
Models 


Number  of  Years  into  Future  = 


1 

Country 

Year 

2014 

Year 

2018 

Year 

2019 

Afghanistan 

No  Conflict 

Conflict 

No  Conflict 

Conflict 

No  Conflict 

Conflict 

Status : 

Conflict 

No  Conflict 

0.96043009 

0.0395699 

No  Conflict 

0.85087309 

0.149127 

No  Conflict 

0.817206405 

0.182794 

Conflict 

1.5314E-05 

0.9999847 

Conflict 

5.7715E-05 

0.999942 

Conflict 

7.07442E-05 

0.999929 

2 

Country 

Year 

2014 

Year 

2018 

Year 

2019 

Albania 

No  Conflict 

Conflict 

No  Conflict 

Conflict 

No  Conflict 

Conflict 

Status : 

No  Conflict 

No  Conflict 

0.92440784 

0.0755922 

No  Conflict 

0.73104279 

0.268957 

No  Conflict 

0.676322497 

0.323678 

Conflict 

0.00201078 

0.9979892 

Conflict 

0.00715438 

0.992846 

Conflict 

0.008609958 

0.99139 

Figure  14:  Conflict  Transition  Probability  Markov  Tool 
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3.5  Summary 

This  chapter  described  the  methodology  employed  to  construct  the  conditional 
conflict  database,  as  well  as  the  theory  and  methodology  guiding  the  development  of  the 
logistic  regression  models  that  are  used  to  calculate  the  transition  probabilities  required 
for  the  Markov  Models.  The  creation  methodology  enables  repeatability  of  this  study’s 
results  as  well  a  means  to  evaluate  additional  alternatives  associated  with  variable 
selection  and  model  development. 

This  methodology  examined  30  statistical  variables  acquired  from  several  open 
sources  for  the  development  of  both  the  dependent  variable  and  the  conditional  logistic 
regression  models.  The  data  resources  employed  in  this  methodology  are  similar  to,  or 
updates  of,  the  previous  analytical  efforts  discussed  in  Chapter  II.  The  data  sets  utilized 
in  this  study  are  professionally  created  and  maintained  by  reputable  organizations,  that 
strive  to  maintain  the  most  current  and  accurate  data.  However,  the  nature  of  data 
collection  in  less  than  fully  pennissive  environments  results  in  incomplete  and  often  time 
lagged  data  sets  that  form  the  basis  of  this  study.  Despite  less  than  timely  and  perfect  in 
data,  there  exist  methods  and  techniques  that  enable  the  construction  and  relevant 
analysis  of  robust  conflict  prediction  models. 

The  strengths  of  the  models  developed  for  this  study  lie  in  their  ability  to 
“operationalize”  complex  regional  conflict  environments  to  key  underlying  factors  that 
influence  conflict  transitions.  Additionally,  the  combination  of  logistic-regression  and 
Markov  models  enables  long  range  forecasting  of  world-wide  conflict  trends  that  is  not 
possible  with  logistic-regression  models  alone.  Moreover,  the  models  developed  for  this 
study  are  surprisingly  not  limited  in  their  predictive  power  by  the  quantity,  quality,  and  in 
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some  instances  the  timeframe  of  the  infonnation  currently  available.  As  will  be 
discussed  in  later  on  in  this  work,  seminal  events  such  as  the  Arab  Spring  or  the  rise  of 
the  Islamic  State  signal  a  possible  paradigm  shift  of  relevant  predictors  of  violent 
conflict,  a  shift  that  may  take  several  years  of  data  collection  to  fully  realize  the 
precursors  and  impacts  of  these  events.  The  implementation  of  the  methodologies 
previously  described,  and  the  relevant  analysis  is  presented  in  Chapter  IV. 


79 


IV.  Analysis  and  Results 


"However,  the  pulse  of  the  God  of  War  is  hard  to  take.  If  you  want  to  discuss  war, 
particularly  the  war  that  will  break  out  tomorrow  evening  or  the  morning  of  the  day  after 
tomorrow,  there  is  only  one  way,  and  that  is  to  determine  its  nature  with  bated  breath, 
carefully  feeling  the  pulse  of  the  God  of  War  today.  ” 

Qiao  Liang,  Unrestricted  Warfare 


4.1  Chapter  Overview 

The  purpose  of  this  chapter  is  to  describe  and  analyze  the  results  of  the 
methodology  discussed  in  Chapter  III.  First,  in  Section  4.2,  we  discuss  the  construction 
and  validation  of  the  six  regional  conditional  logistic  regression  models.  Next,  Section 
4.3  provides  an  in  depth  analysis  of  the  significant  variables  by  region  and  conditional 
model.  Subsequently,  Section  4.4  examines  the  construction,  validation,  and  results  for 
the  nation  specific  Markov  models.  Finally,  in  Section  4.5  we  provide  an  analysis  of 
future  global  conflict  trends  developed  from  the  Markov  models. 

4.2  Analysis  of  Region  Specific  Conditional  Logistic  Regression  Models 
Development  of  the  Regional  Conditional  Logistic  Regression  Models 

The  Purposeful  Selection  of  Covariates  method  was  employed  in  the  construction 
of  all  twelve  conditional  logistic  regression  models  (two  per  region)  used  in  this  study. 
The  method  provides  a  systematic  means  to  efficiently  construct  meaningful  and 
operationally  relevant  models  that  achieve  suitable  classification  accuracies  in  both  the 
training  and  validation  data  sets.  Initial  analysis  of  the  logistic  regression  models  focused 
on  maximizing  the  area  under  the  curve  (AUC)  for  the  specific  ROC  curves  while 
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simultaneously  ensuring  appropriate  model  and  variable  significance.  It  is  the  goal  of 
this  study  to  ensure  an  AUC  greater  than  0.80  for  models  with  p-values  less  than  or  equal 
to  0.05.  As  seen  in  Table  16,  the  conditional  model  developed  for  Sub-Saharan  African 
States  classified  as  not  in  conflict  the  previous  year,  is  significant  with  p-value  of  0.00001 
with  a  complimentary  AUC  of  0.874,  indicating  the  model  is  both  highly  significant  and 
an  excellent  discriminator.  Additionally  all  seven  variables  have  p-values  considerably 
less  than  0.05,  indicating  high  levels  of  significance,  and  further  reinforcing  the  overall 
suitability  of  this  model  for  conflict  transition  prediction. 

Table  16:  Sub-Saharan,  Given  Non-Conflict  Logistic  Regression  Model 


Sub-  Saharan  Africa  (Given  Non-Conflict)  Model 


Variable 

Coefficient 

G 

P 

Arable  Land 

7.801 

13.640 

0.000 

Birth  Rate 

-0.474 

8.740 

0.003 

Infant  Mortality  rate 

0.053 

8.240 

0.004 

Youth  Bulge 

0.346 

6.470 

0.011 

Refugee  (Asylum) 

5.91E-06 

5.890 

0.015 

Trade  (%  GDP) 

-0.052 

7.830 

0.005 

Freedom  Score 

-5.637 

12.760 

0.000 

Log-Likelihood  = 

43.446 

G  = 

40.754 

P  = 

0.00001 

AUC  = 

0.874 

Given  the  multitude  of  potential  variable  combinations  (equivalent  to  30!)  for 
each  model,  there  exist  multiple  potential  significant  conditional  models  for  each  region. 
Consequently,  multiple  distinct  models  were  developed,  analyzed  and  compared  to 
identify  the  optimal  conditional  models  for  each  region.  If  the  initial  analysis  indicated 
the  models  experienced  satisfactory  significance  and  discrimination,  an  in-depth  analysis 
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of  overall  model  performance  was  conducted;  an  example  of  such  a  comparison  is 
presented  in  Table  17. 


Table  17:  Sub-Saharan  Africa  “Not  in  Conflict”  Conditional  Model  Comparison 


Sub-  Saharan  Africa  (Given  Non-Conflict)  Model  1 


Variable 

Coefficient 

G 

P 

Arable  Land 

7.801 

13.640 

0.000 

Birth  Rate 

-0.474 

8.740 

0.003 

Infant  Mortality  rate 

0.053 

8.240 

0.004 

Youth  Bulge 

0.346 

6.470 

0.011 

Refugee  (Asylum) 

5.91E-06 

5.890 

0.015 

Trade  (%  GDP) 

-0.052 

7.830 

0.005 

Freedom  Score 

-5.637 

12.760 

0.000 

Log-Likelihood  = 

43.446 

G  = 

40.754 

P  = 

0.000 

AUC  = 

0.874 

Training  Data  Set  Model 


Sub-Saharan  Africa  (Right  Node):  2004-2010 

Observed 

Classified 

Transition  to 

Conflict  =  1 

Re  main/T  ransition 

out  of  Conflict  =  0 

Total 

Transition  to 

10 

Conflict  =  1 

Remain/Transition 

out  of  Conflict  =  0 

24 

155 

179 

Total 

30 

159 

189 

Med  Cut  Point: 

0.50 

Model  Acuracy: 

0.852 

Validation  Data  Se  Model 

Sub-Saharan  Africa  (Right  Node):  2011-2013 

Observed 

Transition  to 

Re  main/T  rans  itio  n 

Classified 

Conflict  =  1 

out  of  Conflict  =  0 

Total 

Transition  to 

Conflict  =  1 

2 

0 

2 

Re  main/T rans  ition 

out  of  Conflict  =  0 

10 

61 

71 

Total 

12 

61 

73 

Med  Cut  Point: 

0.50 

Model  Acuracy: 

0.863 

Sub-  Saharan  Africa  (Given  Non-Conflict)  Model  2 


Variable 

Coefficient 

G 

P 

Arable  Land 

6.574 

11.300 

0.001 

Population  Growth 

-1.991 

11.820 

0.001 

Infant  Mortality  rate 

0.028 

3.690 

0.055 

Caloric  Intake 

1.70E-03 

6.170 

0.013 

Refugee  (Asylum) 

0.000 

5.690 

0.017 

Trade  (%  GDP) 

-0.052 

8.890 

0.003 

Freedom  Score 

-6.575 

14.590 

0.000 

Log-Likelihood  = 

41.656 

G  = 

44.334 

P  = 

0.000 

AUC  = 

0.873 

Training  Data  Set  Model 


Sub-Saharan  Africa  (Right  Node):  2004-2010 

Observed 

Classified 

Transition  to 

Conflict  =  1 

Remain/Transition 

out  of  Conflict  =  0 

Total 

Transition  to 

1 

Conflict  =  1 

Re  main/T  rans  itio  n 

out  of  Conflict  =  0 

25 

158 

183 

Total 

30 

159 

189 

Med  Cut  Point: 

0.50 

Model  Acuracy: 

0.862 

Validation  Data  Se  Model 

Sub-Saharan  Africa  (Right  Node):  2011-2013 

Observed 

Transition  to 

Remain/Transition 

Classified 

Conflict  =  1 

out  of  Conflict  =  0 

Total 

Transition  to 

Conflict  =  1 

0 

0 

0 

Remain/Transition 

out  of  Conflict  =  0 

12 

61 

73 

Total 

12 

61 

73 

Med  Cut  Point: 

0.50 

Model  Acuracy: 

0.836 

In  the  example  illustrated  in  Table  17,  two  distinct  conditional  models  were 


developed  for  Sub-Saharan  African  states  classified  as  not  in  conflict.  In  this  example, 


Model  1  is  the  same  model  shown  in  Table  16,  while  Model  2  has  replaced  the  variables 


“Birth  Rate”  and  “Youth  Bulge”  with  “Population  Growth”  and  “Caloric  Intake” 
respectively.  Initial  analysis  indicates  satisfactory  significance  for  both  models  and  their 


82 


respective  variables  and  equivalent  discriminatory  powers  as  indicated  by  their  AUC 
values.  Model  perfonnance  was  then  compared  using  the  Training  and  Validation 
datasets  as  described  previously,  with  a  classification  cut-point  fixed  at  0.50  for  all 
comparisons.  The  analysis  focused  on  four  criteria  in  descending  order:  (1)  Overall 
Predictive  Accuracy  of  the  model  on  the  validation  data  set,  (2)  Overall  Predictive 
Accuracy  of  the  model  on  the  training  data  set,  (3)  Overall  ability  to  properly  classify 
rare-events  (transitions  from  current  state)  in  both  data  sets,  and  (4)  Minimum  number  of 
“False  Negatives”  in  the  validation  data  set,  shown  in  the  bottom  left-hand  quadrant  of 
the  classification  table. 

In  this  example,  a  rare-event  is  considered  a  “Transition  to  Conflict”  which  occurs 
in  42  of  the  262  total  instances  across  both  data  sets.  Analysis  of  the  results  shows  that 
Model  1  outperforms  Model  2  in  three  of  the  four  criteria:  higher  validation  model 
predictive  accuracy  (0.863),  classification  of  rare  events  (8  of  42  transitions),  and  10  total 
false  negatives  as  opposed  to  12  in  Model  2.  Despite  having  a  slightly  lower  training 
data  set  accuracy  (0.852),  attributed  to  classifying  155  of  the  159  true-negative  instances, 
using  our  criteria,  Model  1  is  considered  to  be  the  superior  of  the  two  prospective  “non¬ 
conflict”  conditional  models  for  the  Sub-Saharan  Africa  region.  Similar  analyses  were 
conducted  on  the  conditional  models  for  all  regions,  ultimately  identifying  the  12 
conditional  models  used  in  this  study. 

The  objective  of  this  model  building  strategy  was  the  construction  of 
parsimonious  conditional  logistic  regression  models  that  achieve  prediction  accuracies  in 
excess  of  80%  for  both  the  training  and  validation  data  sets.  A  summary  of  the  final 
conditional  models  for  each  region  is  provided  in  Appendix  B. 
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Model  Validation  and  Analysis 

Three  methods  were  employed  in  the  validation  and  analysis  of  the  12  conditional 
logistic  regression  models:  (1)  Receiver  Operating  Characteristic  area  under  the  curve 
values,  (2)  Classification  Accuracy,  and  (3)  Hosmer-Lemeshow  Goodness  of  Fit  Tests. 
These  analyses  assess  the  suitability  of  the  conditional  logistic  regression  models  in  terms 
of  overall  discriminative  power,  model  accuracy  (with  an  emphasis  on  rare-events),  and 
the  model’s  approximation  of  the  data. 

Receiver  Operating  Characteristic  Area  Under  the  Curve  Analysis 

As  stated  earlier,  the  Receiver  Operating  Characteristic  (ROC)  curve  graphically 
depicts  a  models  ability  to  detect  a  signal  in  the  presence  of  noise  across  the  entire  range 
of  possible  cut-points.  ROC  Curves  are  developed  for  both  the  training  and  validation 
data  sets  as  means  to  assess  the  overall  model  performance.  An  example  of  a  typical  set 
of  training  and  validation  ROC  curves  for  a  conditional  model  is  provided  in  Figure  15. 


Suh-Saharan  Africa  (Given  Non-Conflict)  ROC  Curve 


Figure  15:  Graph  of  Training  and  Validation  ROC  Curves. 
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Visual  inspection  of  the  two  ROC  curves  shows  that  the  model  in  this  example 
provides  better  discrimination  than  would  be  obtained  by  simple  binary  guess  (e.g.  a  fair 
coin  toss)  represented  by  the  AUC  =  0.50  diagonal  line  that  bisects  the  graph.  It  is  also 
readily  apparent  that  the  discriminative  power  of  the  training  model  exceeds  that  of  the 
validation  model,  a  logical  result,  which  is  a  product  of  our  model  building  strategy.  To 
understand  the  model’s  performance  more  precisely,  we  employ  AUC  analysis  using  the 
criteria  described  in  Chapter  III.  Table  18  summarizes  the  AUC  values  for  both  the 
training  and  validation  data  sets  for  all  conditional  models  across  the  six  geographic 
regions  and  as  a  combined  world  model. 

Table  18:  AUC  Values  by  Region  and  Model 


Receiver  Operating  Characterist  AUC  Scores  by  Region 

Region  Model 

Training  Data  Set 

AUC  Assessment 

Validation  Data  Set 

AUC  Assessment 

Arab  &  North 

African  States 

In  Conflict 

0.962 

Superior 

0.500 

No  Model 

Discrimination 

Not  in  Conflict 

0.930 

Superior 

0.520 

Poor 

Eastern  Europe  & 
Central  Asia 

In  Conflict 

0.972 

Superior 

0.659 

Poor 

Not  in  Conflict 

0.946 

Superior 

0.651 

Poor 

Latin  America 

In  Conflict 

0.878 

Excellent 

0.750 

Acceptable 

Not  in  Conflict 

0.952 

Superior 

0.776 

Acceptable 

OECD 

In  Conflict 

0.914 

Superior 

0.561 

Poor 

Not  in  Conflict 

0.974 

Superior 

0.735 

Acceptable 

South  &  East  Asia 

In  Conflict 

0.938 

Superior 

0.689 

Poor 

Not  in  Conflict 

0.932 

Superior 

0.696 

Poor 

Sub-Saharan  Africa 

In  Conflict 

0.889 

Excellent 

0.704 

Acceptable 

Not  in  Conflict 

0.874 

Excellent 

0.796 

Acceptable 

Combined  World 

Model 

In  Conflict 

0.887 

Excellent 

0.655 

Poor 

Not  in  Conflict 

0.922 

Superior 

0.743 

Acceptable 

The  assessed  performances  of  all  the  models  using  the  training  data  set  ranges 
from  excellent  discrimination  (0.80  <  AUC  <  0.90)  to  superior  discrimination  (AUC  > 
0.90);  with  8  of  the  12  models  assessed  as  superior  discriminators  on  the  training  data 
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set.  However,  there  is  a  noticeable  degradation  in  model  perfonnance  on  the  validation 
data  sets,  with  performance  assessments  generally  ranging  from  poor  discrimination 
(0.50  <  AUC  <  0.70)  to  acceptable  discrimination  (0.70  <  AUC  <  0.80).  Of  interest  is 
the  significant  decline  in  model  performance  for  the  Arab  and  North  African  conditional 
models,  each  of  which  experiences  performances  losses  in  excess  of  40%  from  the 
training  to  the  validation  data  sets.  Initially,  it  was  theorized  that  the  relative  rarity  of 
conflict  transitions  was  the  root  cause  of  the  degradation  in  performance.  However, 
analysis  of  validation  sets  for  other  region  models  shows  that  this  theory  is  not  highly 
correlated  with  validation  model  perfonnance.  Another  possible  explanation  in  the 
degradation  of  the  validation  model  performance  may  be  linked  to  the  “Arab  Spring”. 
The  Arab  Spring  and  its  resulting  conflicts  have  continued  to  engulf  Southwest  Asia  and 
North  Africa  since  the  Tunisian  revolution.  This  date  is  significant  to  the  Arab  &  North 
African  models,  due  to  resulting  conflicts  in  otherwise  stable  regimes  that  occur  only  in 
the  validation  data  sets. 

Another  interesting  occurrence  that  is  observed  in  the  AUC  scores  is  the  general 
trend  for  “Not  in  Conflict”  models  to  experience  better  performance  at  both  regional  and 
combined  world  levels  than  their  “In  Conflict”  counterparts.  This  trend  is  observed  in 
both  the  training  and  validation  data  sets,  and  it  occurs  in  19  of  the  24  instances  presented 
in  Table  18.  A  possible  explanation  of  this  phenomenon  may  be  related  to  the  inability  to 
accurately  collect  data  from  nations  experiencing  conflict.  The  results  suggest  the  data 
associated  with  nations  that  transition  or  remain  out  of  conflict  provides  improved 
predictive  perfonnance  over  nations  that  tend  to  be  in  conflict. 
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Analysis  of  Model  Validity  Based  on  Classification  Accuracy 

Analysis  of  classification  tables  provides  a  second  method  used  to  assess  logistic 
regression  model  validity  and  suitability.  Classification  table  analysis,  as  it  pertains  to 
this  study,  focuses  on  three  areas:  (1)  Overall  model  accuracy,  (2)  Percentage  of  rare- 
events  properly  classified,  and  (3)  Model  false  negative  rate.  Initial  analyses  fixed  the 
classification  table  cut  point  at  0.50,  thereby  classifying  all  instances  with  <  0.50  as 
transitioning/remaining  out  of  conflict,  and  all  instances  with  7Tj  >  0.50  as 
transition/remaining  in  conflict.  As  stated  previously,  the  objective  of  our  model  building 
strategy  is  to  construct  models  that  achieve  classification  accuracies  in  excess  of  80%  for 
both  the  training  and  validation  data  sets.  A  summary  of  the  overall  model  strategies 
using  the  fixed  cut-point  of  0.50  is  presented  in  Table  19  which  details  the  accuracies  and 
total  instances  per  data  set  for  each  model. 


Table  19:  Overall  Classification  Accuracies  Given  Fixed  Cut-point  of  0.50 


Model  Accuracies  Using  0.50  Classification  Cut  Point 


Region 

Model 

Cut  Point 

Training 

Data  Set 

Validation  Data  Set 

Trailing  and 

Accuracy 

No.  Instances 

Accuracy 

No.  Instances 

Validation 

Arab  &  North 

In  Conflict 

0.50 

94.2% 

52 

74.4% 

43 

85.3% 

African  States 

Not  in  Conflict 

0.50 

93.3% 

60 

60.0% 

15 

86.7% 

Eastern  Europe  & 

In  Conflict 

0.50 

92.5% 

67 

82.8% 

29 

89.6% 

Central  Asia 

Not  in  Conflict 

0.50 

86.0% 

171 

76.7% 

43 

84.1% 

Latin  Ame  rica 

In  Conflict 

0.50 

81.1% 

37 

83.3% 

30 

82.1% 

Not  in  Conflict 

0.50 

90.9% 

132 

88.1% 

42 

90.2% 

OECD 

In  Conflict 

0.50 

88.7% 

53 

86.4% 

22 

88.0% 

Not  in  Conflict 

0.50 

96.0% 

126 

95.0% 

101 

95.6% 

South  &  East  Asia 

In  Conflict 

0.50 

87.3% 

79 

84.1% 

44 

86.2% 

Not  in  Conflict 

0.50 

87.9% 

66 

88.0% 

25 

87.9% 

Sub-Saharan  Africa 

In  Conflict 

0.50 

86.4% 

154 

82.4% 

74 

85.1% 

Not  in  Conflict 

0.50 

85.2% 

189 

86.3% 

73 

85.5% 

Combined  World 

In  Conflict 

0.50 

88.2% 

442 

81.8% 

242 

86.0% 

Results 

Not  in  Conflict 

0.50 

89.1% 

744 

87.0% 

299 

88.5% 
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Model  accuracies  exceeded  the  80%  classification  accuracy  benchmark  in  all  12 
training  data  sets,  and  in  9  of  the  validation  data  sets.  Training  data  set  accuracies 
averaged  88.2%  for  “In  Conflict”  conditional  models,  and  89.1%  for  “Not  in  Conflict” 
models,  with  5  of  the  12  training  data  sets  yielding  accuracies  above  90%.  As  expected, 
the  models  experience  some  degradation  in  their  classification  accuracies  when  applied  to 
the  validation  data  set,  but  they  still  achieve  average  accuracies  of  81.8%  and  87.0%  for 
the  “In  Conflict”  and  “Not  in  Conflict”  models  respectively.  The  overall  classification 
accuracies  for  both  the  training  and  validation  data  sets  exceed  the  80%  benchmark  for  all 
regions  and  are  considered  suitable  for  the  purposes  of  this  study.  Similar  to  the  AUC 
analysis,  the  “Not  in  Conflict”  models  generally  experience  greater  predictive  accuracies 
than  the  “In  Conflict”  counterparts,  with  the  phenomenon  observed  in  19  of  the  24 
instances  provided  in  Table  19.  The  exception  to  this  trend  seems  to  occur  more 
frequently  in  the  Arab  &  North  African,  and  the  Eastern  Europe  &  Central  Asian  models 
than  in  the  rest  of  the  regions. 

These  results  compare  favorably  with  historical  studies  which  have  struggled  to 
achieve  prediction  accuracies  greater  than  80%.  Studies  such  as  the  CAA-led  Forecast 
and  Analysis  of  Complex  Threats  (Reed,  2013)  or  the  Political  Instability  Task  Force’s 
global  forecasting  model  (Goldstone,  et  al.,  2005)  only  achieve  accuracies  greater  than 
80%  on  limited  and  very  specific  data  sets.  On  the  other  hand,  the  Boekestein  model 
achieved  accuracies  approaching  80%  without  implementing  special  conditions  to  enable 
prediction  accuracy;  these  model  accuracies  were  subsequently  compared  to  those 
developed  by  this  study  (Boekestein,  2015).  To  enable  a  one-to-one  model  comparison 
by  region,  we  have  developed  weighted  regional  accuracies  for  both  the  training  and 
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validation  data  sets;  Table  20  provides  a  comparison  of  this  study’s  results  with  the  recent 
Boekestein  study.  As  can  be  seen,  both  models  perform  very  well  at  the  regional  level, 
with  all  training  data  sets  yielding  accuracies  in  excess  of  80%.  Both  models  perform 
similarly  at  the  regional  level,  however  the  conditional  logistic  regression  /  Markov  chain 
(C-LR/MC)  model  developed  for  this  study  achieves  higher  overall  prediction  accuracies 
at  the  combined  world  level.  Comparison  of  the  respective  model  performance  on  the 
validation  data  sets  reveals  that  the  C-LR/MC  model  realizes  a  significant  improvement 
in  prediction  accuracy  over  the  Boekestein  model.  The  C-LR/MC  model  attains  higher 
prediction  accuracies  for  each  of  the  six  regions  for  the  validation  data  set,  and  a  84.67% 
weighted  prediction  accuracy  at  the  combined  world  level. 


Table  20:  Comparison  of  Model  Accuracies  with  the  Boekestein  Model 


Comparison  of  Boekestien  Model  Accuracies  with  Conditional  Logistic  Regression/Markov 

Chain  Weighted  Accracies  by  Region 

Region 

Training  Data  Set  Accuracies 

Validation  Data  Set  Accuracies 

Boekestein 

Model 

Conditional  LR/MC 
Weighted  Accuracies 

Boekestein 

Model 

Conditional  LR/MC 
Weighted  Accuracies 

Arab  &  North 
African  States 

84.31% 

93.72% 

70.59% 

70.68% 

Eastern  Europe  & 
Central  Asia 

77.38% 

87.83% 

75.00% 

79.16% 

Latin  America 

90.12% 

88.75% 

77.78% 

86.10% 

OECD 

95.96% 

93.84% 

92.42% 

93.46% 

South  &  East  Asia 

90.48% 

87.57% 

76.79% 

85.51% 

Sub-Saharan  Africa 

82.31% 

85.74% 

74.49% 

84.34% 

Combined  World 

Results 

86.63% 

88.76% 

78.30% 

84.67% 
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Three  of  the  validation  data  sets,  both  Arab  &  North  African  conditional  models, 
and  the  Eastern  Europe  &  Central  Asia  “Not  in  Conflict”  model,  fail  to  achieve 
accuracies  greater  than  80%.  The  classification  tables  for  these  three  validation  data  sets 
are  shown  in  Table  2 1 . 

Table  21:  Validation  Data  Set  Accuracies  below  Accuracy  Benchmark  of  80% 


Arab  States  (Given  Conflict):  2011-2013 

Observed 

Transition/Remain  in 

Transition  out  of 

Classified 

Conflict  =  1 

Conflict  =  0 

Total 

Transition/Remain  in 

Conflict  =  1 

32 

1 

33 

Transition  out  of 

Conflict  =  0 

10 

0 

10 

Total 

42 

1 

43 

Arab  States  (Given  Non-Conflict):  2010  -  2013 

Observed 

Classified 

Transition  to  Conflict 

=  1 

Re  main/1’ ransition  out 

of  Conflict  =  0 

Total 

Transition  to  Conflict 

=  1 

1 

2 

3 

Remain/Transition  out 

of  Conflict  =  0 

4 

8 

12 

Total 

5 

10 

15 

Med  Cut  Point: 

0.50 

Model  Acuracy: 

0.744 

Med  Cut  Point: 

0.50 

Model  Acuracy: 

0.600 

E.  Europe  &  Central  Asia  (Given  Non-Conflict):  2011  -  2013 

Observed 

Transition  to  Conflict  Remain/Transition  out 

Classified 

=  1 

of  Conflict  =  0 

Total 

Transition  to  Conflict 

=  1 

2 

5 

7 

Remain/Transition  out 

31 

36 

of  Conflict  =  0 

Total 

7 

36 

43 

Med  Cut  Point: 

0.50 

Model  Acuracy: 

0.767 

The  effects  of  the  Arab  Spring  on  model  accuracy  become  apparent  in  the  Arab 
and  North  African  models,  specifically  in  the  “In  Conflict”  model  which  misclassifies  1 1 
of  the  43  instances.  The  model,  developed  using  data  that  completely  pre-dates  the  Arab 
Spring,  achieves  an  accuracy  of  74.4%  and  classifies  nearly  a  quarter  (10)  of  the  total 
instances  as  transitioning  out  of  conflict,  when  in  reality  only  one  such  transition  occurs 
during  the  2011  to  2013  time  period  (i.e.,  Oman  in  2011  -  2012).  The  Arab  &  North 
African  “Not  in  Conflict”  model  experienced  even  greater  misclassification  rates  (40%  of 
all  instances  misclassified),  resulting  in  an  overall  classification  accuracy  of  60%  for  the 
validation  model.  However,  three  misclassified  transitions:  Libya  (2010  -  2011),  Syria 
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(2010  -  2011),  and  Tunisia  (2010  -  2011),  all  of  which  transition  into  conflict  in  2011, 
are  directly  related  to  the  Arab  Spring,  and  it  is  likely  these  nations  would  have  remained 
out  of  conflict  had  this  event  not  occurred. 

As  noted  previously,  the  initial  models  developed  for  the  Eastern  Europe  and 
Central  Asia  “Not  in  Conflict”  data  experienced  numerous  classification  issues,  often 
failing  to  properly  classify  any  transitions  into  conflict.  As  a  result,  the  Synthetic 
Minority  Oversampling  Technique  (SMOTE)  was  employed  to  aid  development  of  a 
model  that  achieved  satisfactory  classification  accuracy  in  both  the  training  and 
validation  data  sets.  These  initial  models  maximized  the  likelihood  of  these  nations 
transitioning  or  remaining  out  of  conflict  resulting  in  significant  false-negative  rates  (in 
excess  of  20%  of  all  instances),  the  complete  failure  to  classify  any  nation  as 
transitioning  into  conflict,  and  model  accuracies  in  the  70%  range.  Despite  failing  to 
generate  classification  accuracies  above  80%,  the  final  “Not  in  Conflict”  model  is  a 
significant  improvement  over  the  earlier  versions,  providing  better  overall  classification 
accuracy  with  reduce  false-negative  rates. 

Accurate  model  building  challenges  for  Eastern  Europe  and  Central  Asia  may  be 
the  result  of  an  ethnically  diverse  and  widespread  geographic  region  that  straddles  the 
both  Eastern  and  Western  civilization.  The  conflicts  within  this  region  generally  take  on 
two  forms;  in  the  east  conflicts  are  generally  the  result  of  long  standing  tribal  conflicts 
and  foreign  intervention,  while  in  the  west  financial  crises,  immigration,  and  political 
turmoil  (notably  in  the  former  Soviet  states)  exacerbate  political  and  societal  instability. 
Ultimately,  future  studies  may  wish  explore  a  realignment  of  the  nations  within  this 
geographic  region  in  order  to  improve  model  performance. 
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While  overall  model  accuracies  are  considered  to  meet  or  exceed  expectations, 
further  analysis  is  required  to  ascertain  model  performance  concerning  rare  events  (i.e., 
transitions  into  or  out  of  conflict).  The  principal  of  maximum  likelihood  will  favor  the 
majority  population  in  any  data  set,  at  the  expense  of  the  minority.  With  transition  rates 
ranging  from  5-20%  across  all  data  sets  it  is  possible  achieve  benchmark  classification 
accuracies  simply  by  only  properly  classifying  the  majority  population  of  the  conditional 
model.  As  part  of  the  overall  model  assessment,  rare  event  accuracies  must  be  taken  into 
account. 


Table  22:  Model  Rare  Event  Accuracies  Given  Fixed  Cut-point  of  0.50 


Model  Rare  Event  Accuracies  Using  0.50  Classification  Cut  Point 

Region  Model  Cut  Point 

Training 

Accuracy 

Data  Set 

No.  Instances 

Validatioi 

Accuracy 

Data  Set 

No.  Instances 

Traning  and 
Validation 

Arab  &  North 

African  States 

In  Conflict 

0.50 

60.0% 

5 

0.0% 

i 

50.0% 

Not  in  Conflict 

0.50 

66.7% 

9 

20.0% 

5 

50.0% 

Eastern  Europe  & 
Central  Asia 

In  Conflict 

0.50 

76.9% 

13 

50.0% 

6 

68.4% 

Not  in  Conflict 

0.50 

79.7% 

64 

28.6% 

7 

74.6% 

Latin  Ame  rica 

In  Conflict 

0.50 

42.9% 

7 

0.0% 

2 

33.3% 

Not  in  Conflict 

0.50 

61.1% 

18 

37.5% 

8 

53.8% 

OECD 

In  Conflict 

0.50 

37.5% 

8 

0.0% 

3 

27.2% 

Not  in  Conflict 

0.50 

55.6% 

9 

50.0% 

4 

53.8% 

South  &  East  Asia 

In  Conflict 

0.50 

23.1% 

13 

33.3% 

6 

26.3% 

Not  in  Conflict 

0.50 

41.7% 

12 

0.0% 

2 

35.7% 

Sub-Saharan  Africa 

In  Conflict 

0.50 

50.0% 

26 

36.4% 

11 

45.9% 

Not  in  Conflict 

0.50 

20.0% 

30 

16.7% 

12 

19.1% 

Combined  World 

Results 

In  Conflict 

0.50 

48.6% 

72 

31.0% 

29 

43.6% 

Not  in  Conflict 

0.50 

59.2% 

142 

26.3% 

38 

52.2% 

Rare  event  classification  accuracies  by  region  and  model  are  provided  in  Table 
22.  Across  all  regions,  the  “In  Conflict”  models  correctly  classified  35  of  the  72  (48.6%) 
transitions  out  of  conflict,  and  the  “Not  in  Conflict”  models  correctly  classified  84  of  the 
142  (59.2%)  transitions  into  conflict  for  all  twelve  training  set  models.  Expectedly, 
validation  rare-event  classification  accuracies  are  generally  lower  than  their  training 
counterparts  at  the  regional  level,  with  9  of  29  (31.0%)  transitions  out  of  conflict,  and  10 
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of  38  (26.3%)  transitions  into  conflict  properly  classified  as  an  aggregate  model.  It  is 
observed  that  4  of  the  12  validation  models  failed  to  properly  classify  a  single  transition 
instance,  represented  by  an  assigned  accuracy  of  0.00%.  However,  in  each  of  these 
cases  the  total  number  of  observed  transitions  is  less  than  or  equal  to  3,  resulting  in  below 
average  transition  rates  for  the  four  regions.  It  is  therefore  assessed  that  overall  model 
suitability  is  not  affected  by  this  singular  result.  Ultimately,  model  rare  event 
classification  accuracies,  for  the  aggregate  data  sets,  average  43.6%  for  “In  Conflict” 
models,  and  52.2%  for  “Not  in  Conflict”  models,  which  is  considered  acceptable  given 
the  above  overall  predictive  accuracies  of  the  logistic  regression  models  combined  with 
the  relative  rarity  of  conflict  transitions  within  the  data  set. 

The  final  classification  table  analysis  involves  adjusting  the  cut-point  in  order  to 
limit  the  number  of  false-negative  classifications  while  maintaining  suitable  model 
accuracy.  In  this  study,  a  false-negative  is  defined  as  a  nation  classified  as 
transitioning/remaining  out  of  conflict,  when  in  fact  the  nation  remains/transitions  into 
conflict.  Given  the  operational  implications  of  misclassifying  a  potential  transition  into 
conflict,  it  is  arguably  better  to  reduce  the  model’s  false  positive  rate,  which  is  achieved 
by  adjusting  the  cut-point  for  each  conditional  model,  than  to  misclassify  a  nation  as 
being  “Not  in  Conflict”.  A  typical  cut  point  analysis  is  presented  in  Figure  16,  which 
graphs  the  conditional  model  accuracy,  false-negative  rate,  and  false-positive  rate  as 
function  of  the  probability  cut-point.  As  is  the  case  for  all  models,  the  false-negative  rate 
declines  as  the  cut-point  approaches  zero.  The  vertical  dashed  lines  represent  the  JMP- 
default  cut  points  and  adjusted  default  cut-points. 
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Sub-Sahara  Africa  (In  Conflict)  Model  Accuracy  vs.  Probability  Cut-off 
(Training  Data  Set) 


—Predictive  Accuracy 
—False  Negatives 
False  Positives 
■  *JMP  Default 
Cut  Point 


Sub-Sahara  Africa  (In  Conflict)  Model  Accuracy  vs.  Probability  Cut-off 
(Validation  Data  Set) 


-  Predictive  Accuracy 
—False  Negatives 
False  Positives 
»  «JMP  Default 
Cut  Point 


Probability  Cut  off 


Figure  16:  Analysis  of  Cut-Point  Effects  on  Classification  Accuracy  and  False- 

Negative  Rates 

Adjustment  of  the  classification  table  cut-point  seeks  to  balance  three  objectives: 
minimize  false-negative  rate,  maintain  model  accuracy,  and  minimize  the  deviation  from 
the  JMP  default  cut-point  of  0.50  for  both  the  training  and  validation  models. 

Minimization  of  the  deviation  in  the  adjusted  cut-point  from  the  JMP-default  is  desired 
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due  to  its  effects  on  limiting  the  model’s  false-positive  rate,  which  increases  as  the  cut- 
point  approaches  zero.  The  adjusted  cut-points  were  set  to  values  less  than  the  JMP- 
Default  in  10.  However,  in  the  remaining  two  cases,  the  default  cut  point  was  maintained 
due  to  no  appreciable  improvements  in  the  training  or  validation  models’  accuracy  or 
false  negative  rate.  A  summary  of  the  adjusted  cut-points  effects  on  model  accuracies 
and  false  negative  rates  in  presented  in  Table  23. 

Table  23:  Effects  of  Adjusted  Cut-points  on  Model  Accuracy  and  False  Negative 

Rates 


Model  Accuracies  Seeking  to  Minimize  False  Negative  Classifications  in  Training  &  Validation  Models 

Training  Data  Set 

Validation  Data  Set 

Region 

Model 

Cut  Point 

Accuracy 

False-Positive 

Decrease 

Effects  on  Model 
Accuracy 

Accuracy 

False-Positive 

Decrease 

Effects  on  Model 
Accuracy 

Arab  &  North 

In  Conflict 

0.30 

94.2% 

-100.0% 

0.0% 

86.0% 

-50.0% 

11.6% 

African  States 

Not  in  Conflict 

0.15 

81.7% 

-33.3% 

-11.6% 

66.7% 

-25.0% 

6.7% 

Eastern  Europe  & 

In  Conflict 

0.34 

92.5% 

-100.0% 

0.0% 

86.2% 

-50.0% 

3.4% 

Central  Asia 

Not  in  Conflict 

0.33 

87.1% 

-84.6% 

1.1% 

76.7% 

0.0% 

0.0% 

Latin  America 

In  Conflict 

0.45 

83.7% 

-66.7% 

6.7% 

90.0% 

-66.7% 

6.7% 

Not  in  Conflict 

0.40 

90.9% 

-28.6% 

0.0% 

88.1% 

0.0% 

0.0% 

OECD 

In  Conflict 

0.50 

88.7% 

0.0% 

0.0% 

86.4% 

0.0% 

0.0% 

Not  in  Conflict 

0.30 

96.0% 

-25.0% 

0.0% 

94.1% 

0.0% 

-0.9% 

South  &  East  Asia 

In  Conflict 

0.50 

87.3% 

0.0% 

0.0% 

84.1% 

0.0% 

0.0% 

Not  in  Conflict 

0.42 

84.8% 

0.0% 

-3.1% 

84.0% 

-50.0% 

-4.0% 

Sub-Saharan  Africa 

In  Conflict 

0.30 

85.1% 

-87.5% 

-1.3% 

87.8% 

-83.3% 

5.4% 

Not  in  Conflict 

0.30 

85.2% 

-16.7% 

0.0% 

86.3% 

0.0% 

0.0% 

Combined  World 

In  Conflict 

0.40 

87.3% 

-53.3% 

0.5% 

86.8% 

-29.2% 

2.5% 

Results 

Not  in  Conflict 

0.32 

84.5% 

-34.5% 

-0.3% 

84.9% 

-3.6% 

-1.0% 

Adjusted  cut  point  values  were  tailored  to  each  conditional  model  and  ranged 
from  0.15  to  0.50,  with  the  average  cut-point  set  to  0.40  and  0.32  for  the  world  level 
aggregate  “In  Conflict”  and  “Not  in  Conflict”  models.  These  average  cut  points  have 
negligible  adverse  impacts  on  overall  and  rare-event  accuracies,  and  in  many  cases  offer 
modest  improvements  at  the  regional  level.  Subsequently,  the  adjusted  cut-points  result 
in  an  overall  decrease  in  the  conditional  model  false  negative  rates  at  the  aggregate  world 
level  for  both  the  “In  Conflict”  and  “Not  in  Conflict”  models. 
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Analysis  of  Hosmer-Lemeshow  Goodness  of  Fit  Tests 

The  Hosmer-Lemeshow  Goodness  of  Fit  test  provides  the  third  and  final  method 
to  assess  the  overall  suitability  of  the  logistic  regression  models.  The  Hosmer-Lemeshow 
test  assesses  the  lit  transition  probabilities,  7r(Xj),  generated  for  each  model  instance,  as 
they  relate  to  the  observed  transition  state.  Model  subgroupings  were  tailored  to  the 
individual  models  based  on  number  of  occurrences  in  the  training  model  set  and  their 
corresponding  transition  probabilities.  The  design  objective  is  to  construct  10  equally 
sized  sub-groups,  providing  a  corresponding  test  statistic  of  xfo.os,  8)  =  15.507. 
However  smaller  numbers  of  sub-grouping  were  employed  in  5  of  the  12  tests.  Via 
Equation  16,  we  are  able  to  develop  the  Hosmer-Lemeshow  Statistic  (C)  and  compare  it 
to  its  corresponding  Chi-square  test  statistic  for  each  model.  Assessed  fit  of  a  particular 
model  is  considered  satisfactory  if  C  <  xfo.os,  g- 2)-  The  results  of  this  analysis  are 
summarized  in  Table  24. 


Table  24:  Hosmer-Lemeshow  Goodness  of  Fit  Test  Results 


Hosmer-Lemeshow  Goodness  of  Fit  Results  given  a  =  0.05 


Region 

Model 

H-L  Statistic  (C) 

Test  Statistic 

P{T.S.  >  C} 

Assessment 

Arab  &  North 
African  States 

In  Conflict 

1.550 

5.991 

0.461 

Model  Apears  to  fit  the  data  well. 

Not  in  Conflict 

6.390 

15.507 

0.604 

Model  Apears  to  fit  the  data  well. 

Eastern  Europe  & 
Central  Asia 

In  Conflict 

0.414 

5.991 

0.813 

Model  Apears  to  fit  the  data  well. 

Not  in  Conflict 

18.392 

15.507 

0.018 

Model  Does  Not  Fit  Data  Well 

Latin  America 

In  Conflict 

1.425 

5.991 

0.490 

Model  Apears  to  fit  the  data  well. 

Not  in  Conflict 

4.812 

15.507 

0.777 

Model  Apears  to  fit  the  data  well. 

OECD 

In  Conflict 

0.236 

3.841 

0.627 

Model  Apears  to  fit  the  data  well. 

Not  in  Conflict 

0.347 

7.815 

0.951 

Model  Apears  to  fit  the  data  well. 

South  &  East  Asia 

In  Conflict 

856.726 

15.507 

0.000 

Model  Does  Not  Fit  Data  Well 

Not  in  Conflict 

37.342 

15.507 

0.000 

Model  Does  Not  Fit  Data  Well 

Sub-Saharan  Africa 

In  Conflict 

6.440 

15.507 

0.598 

Model  Apears  to  fit  the  data  well. 

Not  in  Conflict 

48.543 

15.507 

0.000 

Model  Does  Not  Fit  Data  Well 

Initial  results  indicate  that  8  of  the  12  conditional  models  appear  to  provide 
satisfactory  fits  with  the  exceptions  being:  Eastern  Europe  -  Not  in  Conflict,  both  South 
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&  East  Asia  models,  and  the  Sub-Saharan  Africa  -  Not  in  Conflict  model.  Analysis  of 
these  four  models  identified  the  set  of  outliers,  provided  in  Table  25,  that  significantly 
contribute  to  the  adverse  test  results. 


Table  25:  Hosmer-Lemeshow  Test  Significant  Outliers 


Easter  Europe  &  Central  Asia  (Not  in  Conflict) 

Nation 

Year 

T  rans  itio  n/Re  main 
in  Conflict  (0,  1) 

Probability 

Sub  Group 

Belarus 

2008-2009 

0 

0.974 

10 

South  &  East  Asia  (In  Conflict 

Maldives 

2004-2005 

0 

0.965 

2 

Bangladesh 

2007-2008 

0 

0.973 

2 

Cambodia 

2004-2005 

0 

0.977 

2 

Korea,  North 

2010-2011 

0 

0.987 

2 

Timor-Leste 

2008-2009 

0 

0.995 

3 

China 

2004-2005 

0 

0.999 

4 

Sri  Lanka 

2009-2010 

0 

1.000 

7 

South  &  East  Asia  (Not  inConflict) 

Samoa 

2011-2012 

1 

0.005 

3 

Sub-Saharan  Africa  (Not  in  Conflict) 

Congo,  Republic  of  the 

2006-2007 

1 

0.014 

4 

Comoros 

2006-2007 

1 

0.015 

4 

Comoros 

2009-2010 

1 

0.015 

4 

Mali 

2005-2006 

1 

0.024 

5 

Mauritania 

2007-2008 

1 

0.028 

5 

Sierra  Leone 

2010-2011 

1 

0.041 

6 

While  the  Hosmer-Lemeshow  test  assesses  the  overall  fit  of  the  model  to  the  data, 
the  overarching  objective  of  this  analysis  is  to  identify  and  assess  the  existence  of  any 
significant  model  defects;  this  is  achieved  through  outlier  analysis.  For  the  purposes  of 
this  study,  significant  outliers  are  misclassified  observations  with  assigned  transition 
probabilities  less  than  0.10  for  “In  Conflict”  and  greater  than  0.90  for  “Not  in  Conflict” 
models.  In  two  of  the  four  models:  Eastern  Europe  -  Not  in  Conflict  and  South  &  East 
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Asia  -  Not  in  Conflict,  the  presence  of  a  single  outlier  indicates  a  possible  issue  in  model 
fit,  a  highly  dubious  result  given  that  a  single  outlier  represents  approximately  1%  of  the 
total  instances  for  each  model.  This  result  is  due  in  part  to  the  method  used  to  calculate 
the  Hosmer-Lemeshow  statistic  (C),  which  exponentially  penalizes  differences  in  the 
number  of  observed  (0,*)  and  expected  (<?,*)  occurrences  (per  bin),  when  the  number  of 
expected  occurrences  is  small  (i.e.,  e-,k  <  0.15).  While  a  single  significant  outlier  does  not 
elicit  concern  in  the  overall  suitability  of  a  particular  model,  the  presence  of  multiple 
outliers  may  indicate  the  presence  of  model  defects  that  require  further  investigation. 

The  seven  significant  outliers  present  in  the  South  &  East  Asia-In  Conflict  model 
represent  misclassifications  of  nations  predicted  to  remain  in  conflict  but  which 
transitioned  to  a  non-conflict  status  in  the  following  year.  Similarly,  the  six  instances  in 
the  Sub-Saharan  Africa-Not  in  Conflict  model  represent  occurrences  of  nations  predicted 
to  remain  out  of  conflict  but  which  transitioned  to  a  conflict  status  in  the  subsequent  year. 
Given  the  demonstrated  difficulty  of  correctly  classifying  conflict  transitions,  an  audit  of 
the  individual  outliers  was  conducted  to  determine  if  the  assigned  conflict  transition 
probabilities  were  appropriate  for  the  nation  and  region.  For  the  South  &  East  Asia  -  In 
Conflict  model,  the  audit  revealed  that  the  assigned  probabilities  were  appropriate  in  five 
of  the  seven  instances,  the  exceptions  being  Maldives  (2004  -  2005)  and  North  Korea 
(2010  -  2011),  given  average  probability  of  remaining  in  conflict  and  the  number  of 
years  the  nations  were  in  a  state  of  violent  conflict  between  2004  and  2014.  The  audit  of 
the  Sub-Saharan  Africa-Not  in  Conflict  model  determined  that  the  assigned  probabilities 
were  appropriate  for  three  of  the  five  nations,  with  only  Mali  and  Mauritania,  tending  to 
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be  in  a  state  of  conflict,  due  to  above  average  political  instability,  over  the  same  1 1  -year 
period.  The  results  of  this  audit  are  provided  in  Table  26. 


Table  26:  Audit  of  Significant  Outliers 


South  &  East  Asia  (In  Conflict) 

Nation 

Year 

Transition/Remain 
in  Conflict  (0,  1) 

Probability 

Average 

Probabity 

Number  Years  in 

Conflict  Status 
(2004  -2014) 

Maldives 

2004-2005 

0 

0.965 

0.521 

3 

Bangladesh 

2007-2008 

0 

0.973 

0.984 

10 

Cambodia 

2004-2005 

0 

0.977 

0.992 

8 

Korea,  North 

2010-2011 

0 

0.987 

0.551 

2 

Timor-Leste 

2008-2009 

0 

0.995 

0.620 

4 

China 

2004-2005 

0 

0.999 

0.999 

10 

Sri  Lanka 

2009-2010 

0 

1.000 

0.884 

9 

Sub-Saharan  Africa  (Not  In  Conflict) 

Congo,  Republic  of  the 

2006-2007 

1 

0.014 

0.123 

3 

Comoros 

2006-2007 

1 

0.015 

0.312 

3 

Comoros 

2009-2010 

1 

0.015 

0.312 

3 

Mali 

2005-2006 

1 

0.024 

0.606 

9 

Mauritania 

2007-2008 

1 

0.028 

0.452 

6 

Sierra  Leone 

2010-2011 

1 

0.041 

0.206 

2 

Overall  Assessment  of  Logistic  Regression  Models 

Given  the  results  of  this  analysis,  each  of  the  12  conditional  logistic  regression 
models  are  considered  satisfactory  and  valid  for  the  purposes  of  this  study.  Each  of  the 
logistic  regression  models  exhibit  excellent  to  superior  levels  of  discrimination  for  the 
training  data  sets  and  adequate  discrimination  for  the  validation  data  sets.  Model 
accuracies  exceeded  pre-established  benchmarks  (80%  accuracy)  in  all  12  training 
models  and  10  of  12  validation  models,  with  overall  model  accuracies  averaging  86.0% 
and  88.5%  for  the  “In  Conflict”  and  “Not  in  Conflict”  models  respectively.  Assessment 
of  model  fit  initially  determined  that  only  8  of  12  models  appeared  to  fit  the  data, 
however  further  analysis  determined  that  the  transition  probabilities  assigned  to  the 
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“significant  outliers”  were  suitable  and  acceptable  given  historical  data.  Table  27 
provides  a  summary  of  the  results  of  the  various  analysis  conducted  on  the  conditional 
logistic  regression  models.  Given  our  metrics  we  assess  as  superior  the  Latin  America  - 
Not  in  Conflict  and  OECD  -  Not  in  Conflict  models  due  to  their  overall  AUC,  accuracy 
and  model  fit.  Additionally  six  models  are  assessed  as  excellent  models,  while  four 
models  as  assessed  as  satisfactory  due  to  their  overall  fit  of  the  data. 

Table  27:  Overall  Assessment  of  Conditional  Logistic  Regression  Models 


Overall  Assessmnet  of  Conditional  Logistic  Regression  Models 

Region 

Model 

Training  Data 

Overall  Model 

Hosmer-Lemeshow  Goodness  of 

Overall  Model 

Set  AUC 

Accuracy 

Fit  Results 

Assessment 

Arab  &  North 

In  Conflict 

0.962 

85.3% 

Model  Apears  to  fit  the  data  well. 

Model  is  Excellent 

African  States 

Not  in  Conflict 

0.930 

86.7% 

Model  Apears  to  fit  the  data  well. 

Model  is  Excellent 

Eastern  Europe  & 

In  Conflict 

0.972 

89.6% 

Model  Apears  to  fit  the  data  well. 

Model  is  Excellent 

Central  Asia 

Not  in  Conflict 

0.946 

84.1% 

Model  Does  Not  Fit  Data  Well 

Model  is  Satisfactory 

Latin  America 

In  Conflict 

0.878 

82.1% 

Model  Apears  to  fit  the  data  well. 

Model  is  Excellent 

Not  in  Conflict 

0.952 

90.2% 

Model  Apears  to  fit  the  data  well. 

Model  is  Superior 

OECD 

In  Conflict 

0.914 

88.0% 

Model  Apears  to  fit  the  data  well. 

Model  is  Excellent 

Not  in  Conflict 

0.974 

95.6% 

Model  Apears  to  fit  the  data  well. 

Model  is  Superior 

South  &  East  Asia 

In  Conflict 

0.938 

86.2% 

Model  Does  Not  Fit  Data  Well 

Model  is  Satisfactory 

Not  in  Conflict 

0.932 

87.9% 

Model  Does  Not  Fit  Data  Well 

Model  is  Satisfactory 

Sub-Saharan  Africa 

In  Conflict 

0.889 

85.1% 

Model  Apears  to  fit  the  data  well. 

Model  is  Excellent 

Not  in  Conflict 

0.874 

85.5% 

Model  Does  Not  Fit  Data  Well 

Model  is  Satisfactory 

4.3  Analysis  of  Significant  Conflict  Transition  Variables 

While  there  is  significant  benefit  in  accurate  prediction  of  nation-state  violent 
conflicts,  many  of  these  benefits  are  rendered  operationally  irrelevant  without  an 
understanding  of  the  underlying  correlation  and  effects  of  the  significant  predictor 
variables.  This  analysis  seeks  to  assess  the  relative  importance,  based  upon  p-value,  of 
the  specific  predictor  variables  within  a  model  and  detennine  how  those  variables  are 
correlated  with  a  transition  into  conflict.  Figure  17  provides  the  basic  mapping  scheme 
for  covariate  correlation  based  upon  correlation  type  (positive  or  negative)  and  magnitude 
(Dark  Green  -  highly  negatively  correlated;  Dark  Red  -  highly  positively  correlated). 
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Additionally  in  all  subsequent  analyses,  the  predictor  variables  are  listed  from  left  to  right 
in  terms  of  statistical  significance,  based  on  their  p-value,  within  the  model. 


Transition  to  Conflict  (Y) 


<y 


s 


cF 


& 


J? 


A 


O 

0° 


Gc 


A 


-0.500  0.000 


0.500 


Figure  17:  Covariate  Correlation  to  Dependent  Variable 

The  operational  relevance  of  this  analysis  is  predicated  on  identifying  variables 
that  can  be  either  monitored  or  affected  in  some  manner  with  the  goal  controlling  a 
nation’s  transition  into  or  out  of  conflict.  While  correlation  does  not  imply  causation,  this 
analysis  seeks  to  enable  the  influencing  of  the  behavior  of  these  large  scale  regional 
dynamic  systems  in  a  manner  beneficial  to  United  States  strategic  objectives. 

Arab  &  North  African  States  -  In  Conflict 

Of  the  four  statistical  variables  employed  in  the  Arab  and  North  African  States  - 
In  Conflict  model,  ethnic  diversity  and  democratic  governments  are  statistically  the  most 
influential  variables  associated  with  conflict  transitions  for  Arab  nations  currently  in 
conflict.  As  seen  in  Figure  18,  ethnic  diversity  is  negatively  correlated  with  transitions 
into  conflict,  implying  that  increasing  a  nation’s  ethnic  diversity  score  (i.e.,  the 
percentage  of  the  population  made  up  by  the  dominant  ethnic  group)  reduces  the 
probability  that  an  Arab  nation  currently  in  conflict  will  remain  in  conflict.  Conversely, 
the  presence  of  democratic  governments  is  positively  correlated  to  a  nation’s  probability 
of  remaining  in  a  state  of  violent  conflict.  While  previous  studies  have  suggested  that 
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that  risk  of  conflict  is  highest  among  emerging  democracies  (Goldstone,  et  ah,  2005),  the 
significance  of  this  variable  is  heavily  influenced  by  the  conflicts  in  Algeria,  Lebanon, 
and  Tunisia,  the  only  nations  with  fully  democratic  governments  within  the  region  during 
this  time  period.  In  all  instances,  these  nations  are  classified  as  being  in  a  state  of  violent 
conflict,  with  no  observed  transitions  out  of  that  state. 


Figure  18:  Arab  &  North  African  States  (In  Conflict)  Covariate  Effects 

A  more  accurate  appraisal  of  the  effects  of  regime  type  within  this  region  can  be 
obtained  by  comparing  the  ratio  of  instances  violent  conflict  by  government  type.  The 
Arab  &  North  African  data  set  contains  187  total  instances,  with  58.3%,  or  109 
observations,  of  those  instances  classified  as  being  in  state  of  violent  conflict.  From  this 
data  set,  91  nation-year  instances  are  classified  as  having  Autocratic  governments,  with 
the  remaining  96  instances  classified  as  having  one  of  the  five  alternative  regime  types. 
Overall  the  rate  of  violent  conflict  in  autocratic  regimes  was  29.7%,  27  total  instances, 
significantly  lower  than  the  regional  average.  However,  nations  listed  as  having  some 
other  regime  type  experienced  conflict  in  85.4%  or  82  instances  over  the  1 1 -year  period. 
The  significance  of  this  finding  is  the  correlation  between  Arab  autocratic  governments 
lower  probabilities  of  conflict.  Goldstone  found  similar  results  in  the  CIA-funded  study, 
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where  he  found  that  the  risk  of  instability  was  lowest  in  full  autocracies  (Goldstone,  et  ah, 
2005). 

Arab  &  North  African  States  -  Not  in  Conflict 

As  in  the  Arab  &  North  African  “In  Conflict”  model,  ethnic  diversity  is  identified 
as  the  most  significant  of  the  six  variables  employed  in  this  model.  However,  for  nations 
currently  not  in  conflict,  higher  ethnic  diversity  scores  are  correlated  with  an  increased 
likelihood  of  that  such  a  nation  will  transition  into  conflict  in  the  following  year.  Since 
many  of  the  same  nations  are  present  in  both  the  “In  Conflict”  and  “Not  in  Conflict”  data 
sets,  such  a  finding  implies  that  an  imbalance  exists  in  the  region’s  ethnic  diversity, 
exacerbating  the  overall  instability  of  the  region.  In  addition  to  ethnic  diversity, 
increased  religious  diversity  scores  (%  of  the  population  comprised  by  largest  religious 
group),  death  rates,  and  youth  populations  are  correlated  with  transitions  into  conflict. 
Additionally  these  variables  are  also  positively  correlated  with  each  other,  indicating 
likely  interdependencies  between  these  predictor  variables.  On  the  other  hand,  greater 
average  life  expectancies  are  correlated  to  lower  incidences  of  transitions  into  conflict, 
though  this  result  may  be  a  function  that  life  expectancies  should  logically  be  greater 
when  violent  conflicts  are  not  taking  place.  The  summary  of  variable  effects  and 
correlations  is  provided  in  Figure  19. 
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Source 

Chi-Sq 

Prob>  Chi-Sq 

Ethnic  Diversity 

16.895 

<.0001 

Religious  Diversity 

16.471 

<.0001 

Life  Expectancy 

16.262 

<.0001 

Death  Rate 

14.099 

0.0002 

Youth  Bulge 

12.768 

0.0004 

Military  Expend  (%  GDP) 

7.605 

0.0058 

Figure  19:  Arab  &  North  African  States  (Not  in  Conflict)  Covariate  Effects 

Eastern  Europe  &  Central  Asia  -  In  Conflict 

Analysis  of  the  variables  associated  with  conflict  transitions  of  eastern  European 
and  central  Asian  nations  currently  identified  as  being  in  a  state  of  conflicts  identifies  a 
nations  international  trade  level,  as  a  percentage  of  it  gross  domestic  product  (GDP)  as 
the  most  significant  with  the  model.  Trade  is  identified  as  being  negatively  correlated 
with  a  state  remaining  in  conflict,  an  expected  result  given  that  stable  and  less  violent 
nations  should  have  higher  levels  of  international  trade.  Similar  to  other  regional  models, 
population  statistics  (specifically  those  correlated  with  increased  youth  populations  high 
densities)  are  correlated  with  increased  incidences  of  transitions  into  conflict.  As  seen  in 
Figure  20,  fertility  rates,  infant  mortality  rates,  and  population  density  are  all  positively 
correlated  with  transitions  into  conflict,  and  with  each  other.  This  finding  indicates  a 
reduction  in  one  of  the  variables,  such  as  “Fertility  Rate”,  may  result,  over  time,  in 
subsequent  decreases  in  a  nation’s  infant  mortality  rate,  population  density  or  both,  with  a 
corresponding  decrease  in  the  probability  of  violent  conflict. 
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Source 

Chi-Sq 

Prob>  Chi-Sq 

Trade  (%  GDP) 

40.106 

<.0001 

Fertility  Rate 

24.026 

<.0001 

Infant  Mortality  rate 

11.693 

0.0006 

Population  density 

11.361 

0.0008 

Freedom  Score 

6.395 

0.0114 

Figure  20:  Eastern  Europe  &  Central  Asia  (In  Conflict)  Covariate  Effects 

Eastern  Europe  &  Central  Asia  -  Not  in  Conflict 

A  total  of  nine  variables  were  identified  as  significant  for  eastern  European  and 
central  Asian  nations  current  in  a  state  of  non-conflict.  Of  note  is  the  significance 
associated  with  regime  type,  specifically  those  governments  identified  as  either  emerging 
democracies,  or  experiencing  foreign  interruption  of  their  political  processes,  which  is 
given  in  Figure  21.  As  noted  earlier,  the  existence  of  transitional  or  emerging 
governments  is  highly  correlated  with  violent  conflict,  which  makes  logical  sense  due  to 
the  loss  of  government  function  and  continuity.  As  was  the  case  for  the  Arab  nation 
models,  this  finding  is  only  part  of  story.  For  the  period  of  2004  to  2014,  there  are  308 
total  instances  in  the  Eastern  Europe  &  Central  Asian  data  set;  of  these  125  instances 
(40.6%)  are  identified  as  being  in  a  state  of  conflict.  However,  unlike  the  Arab  and  North 
African  models,  democratic  nations,  within  the  region  are  less  likely  to  be  in  state  of 
violent  conflict.  Of  this  subset,  only  45  (25.6%)  of  the  176  instances  involving 
democratic  governments  were  identified  as  being  in  a  state  of  conflict.  Further  analysis 
revealed  that  of  the  16  nations  identified  as  having  democratic  governments,  only 
Pakistan  is  located  outside  of  Eastern  Europe,  indicating  that  government  type  may  not 
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provide  the  operational  fidelity  required  for  conflict  prediction  and  forecasting  within  this 


region. 
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Source 

Chi-Sq 

Prob>  Chi-Sq 

Government  Type  (Foreign  Interruption) 

20.532 

<0001 

3  Yr  Freedom  Trend 

18.373 

<0001 

Government  Type  (Emerging) 

16.741 

<0001 

Improved  Water 

15.752 

<0001 

GDP  Per  Capitia 

15.654 

<0001 

2  Yr  Conflict  Intensity  Trend 

11.392 

0.0007 

Religious  Diversity 

10.846 

0.001 

Arable  Land 

9.767 

0.0018 

2  Yr  Freedom  Trend 

7.241 

0.0071 

Government  Type  (Democratic) 

0.235 

0.6282 

Figure  21:  Eastern  Europe  &  Central  Asia  (Not  in  Conflict)  Covariate  Effects 

Of  the  remaining  variables,  access  to  improved  water  sources  and  the  GDP  per 
Capita  were  both  highly  significant  and  negatively  correlated  with  transitions  into 
conflict.  Within  this  model,  these  variables  represent  likely  candidates  that  can  be 
monitored,  manipulated  and  improved  through  the  judicious  application  of  the 
diplomatic,  information,  military,  and  economic  elements  of  national  power,  resulting  in 
a  possible  reduction  in  the  total  number  of  future  transitions  into  conflict. 

Latin  America  -  In  Conflict 

Figure  22  provides  the  covariates  for  this  model.  Non-autocratic  functioning 
governments  are  highly  correlated  with  increased  levels  of  violence  in  Latin  American 
nations,  with  95%  of  the  conflict  incidences  occurring  in  these  nations.  Fully  democratic 
nations  account  for  21  of  the  27  nations  within  the  Latin  American  data  set  and 
subsequently  account  for  a  majority  of  the  conflict  transitions  that  occur  within  the 
region.  However,  nations  identified  as  having  emerging  democratic  governments,  such 
as  Ecuador,  Suriname,  or  Venezuela  are  nearly  twice  as  likely  to  remain  in  conflict  as 
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their  fully  democratic  neighbors.  Increased  religious  diversity  and  freedom  scores  are 
correlated  with  transitions  out  of  violent  conflict,  indicating  that  increasing  the 
percentage  of  the  population  made  up  by  the  religious  majority  or  increasing  individual 
liberties  may  result  in  increased  incidences  of  transitions  to  a  non-conflict  state.  The 
CIA-funded  study  yielded  similar  results  showing  that  increased  factionalism  due  to 
ethnic  and  religious  diffrences  was  positively  correlated  with  political  instability 


(Goldstone,  et  ah,  2005). 
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0.000 
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11.637 

0.001 
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Figure  22:  Latin  America  (In  Conflict)  Covariate  Effects 
Latin  America  -  Not  in  Conflict 

As  shown  in  Figure  23,  nations  currently  not  in  conflict  with  higher  ethnic 
diversity  scores  tend  to  experience  few  transitions  into  conflict  than  nations  with  more 
diverse  populations.  However,  ethnic  diversity  is  positively  correlated  with  religious 
diversity,  which  is  shown  to  have  a  moderate  destabilizing  effect  for  countries  not  in 
conflict.  Similar  to  the  Arab  and  North  African  nations,  there  appears  to  be  an  imbalance 
with  regards  to  the  region’s  ethnic  and  religious  demographics  that  may  aggravate 
regional  discord.  On  the  other  hand,  access  to  improved  water  sources  appears  to  be 
positively  correlated  to  fewer  transitions  into  violent  conflict.  However,  this  finding  may 
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also  be  the  result  of  a  more  permissive  environment  allowing  for  improved  access  to 
fresh  water. 


Analysis  of  Effects 
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Figure  23:  Latin  America  (Not  in  Conflict)  Co  variate  Effects 
OECD  -  In  Conflict 

Nations  belonging  to  the  Organization  for  Economic  Cooperation  and 
Development  (first  world  nations)  experience  violent  conflict  rates  50%  below  the  world 
average.  However,  like  other  regions,  increased  youth  populations  within  OECD  nations 
are  correlated  with  increased  levels  of  violence  and  the  tendency  for  nations  to  transition 
or  remain  in  a  state  of  conflict.  While  not  identified  as  a  significant  variable  within  the 
final  “In  Conflict”  model,  population  migrations  represented  by  the  two  “Refugee” 
variables  are  correlated  with  transitions  into  conflict  as  well  as  increased  youth 
populations  and  military  expenditures  within  OECD  nations.  With  regard  to  population 
migrations,  historically  refugees  are  2.2  times  more  likely  to  seek  asylum  in  an  OECD 
nation  than  originate  from  one.  According  the  2014  HIIK  Conflict  Barometer,  conflicts 
arising  from  population  migrations  have  resulted  in,  or  contributed  to,  many  of  the 
violent  conflicts  experienced  by  OECD  nations,  with  noted  examples  being  the  ongoing 
immigration  and  border  conflict  between  the  United  States  and  Mexico,  violence 
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associated  with  Refugee  and  immigrant  populations  in  France,  and  the  ongoing  Refugee 
crisis  along  Turkey’s  southern  borders  with  Iraq  and  Syria.  The  summary  of  the 
covariate  for  both  OECD  models  are  given  in  Figures  24  and  25. 


Figure  24:  OECD  (In  Conflict)  Covariate  Effects 
OECD  -  Not  in  Conflict 

As  in  the  “In  Conflict”  model,  defense  expenditures  and  youth  populations  are 
considered  significant  predictors  of  conflict  transitions  for  nations  currently  not  in  a  state 
of  conflict.  Again,  population  migrations  are  highly  correlated  to  many  of  the  significant 
variables  within  this  model,  underpinning  the  importance  of  this  emerging  global  trend  in 
national  and  regional  stability  and  security.  Common  to  all  regions,  improvements  in  the 
overall  quality  of  life,  measured  through  proxy  variables  such  as  death  rates  and  average 
life  expectancy  are  correlated  with  decreased  levels  of  violence,  even  if  such  predictor 
variables  are  not  identified  as  significant  within  the  final  model(s). 
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Prob>  Chi-Sq 

Military  Expend  (%  GDP) 

21.345 

<.0001 

Infant  Mortality  rate 

14.897 

0.000 

Caloric  Intake 
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0.000 

Youth  Bulge 
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0.003 

Death  Rate 

8.375 

0.004 

Birth  Rate 

4.445 

0.035 

Figure  25:  OECD  (Not  in  Conflict)  Covariate  Effects 
South  &  East  Asia  -  In  Conflict 

As  shown  in  Figure  26,  increased  levels  of  population  growth  are  correlated  with 
transition  out  of  conflict.  This  relatively  counterintuitive  finding  is  correlated  with 
improvements  in  overall  quality  of  life  and  influenced  by  many  of  the  island  nations 
within  the  Pacific  that  have  higher  population  growth  percentages  and  decreased  levels  of 
violence  than  many  of  the  mainland  and  coastal  Asian  nations.  Government  type  is 
considered  a  highly  significant  predictor  variable  within  this  region,  with  democratic 
governments  experiencing  rates  of  conflict  above  regional  averages.  However,  unlike 
other  regions,  fully  autocratic  governments  do  not  offer  significant  improvements  to  out- 
of-conflict  transition  rates,  and  they  seem  as  likely  to  perpetuate  ongoing  conflicts  as  any 
other  government  type.  Finally,  as  seen  in  other  regional  models,  increasing  trade  levels 
is  correlated  with  decreased  levels  of  violence,  and  it  is  positively  correlated  with  military 
expenditures  which  may  also  bring  about  transitions  out  of  conflict. 
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Figure  26:  South  &  East  Asia  (In  Conflict)  Covariate  Effects 
South  &  East  Asia  -  Not  in  Conflict 

As  shown  in  the  “In  Conflict”  model,  increases  in  military  expenditures  are 
affiliated  with  transitions  in  non-conflict  statuses  for  all  nations  within  South  and  East 
Asia.  This  variable  which  is  positively  correlated  with  a  nation’s  trading  ability  may 
result  in  improvements  to  internal  security  apparatuses  within  many  of  these  nations 
resulting  in  decreased  levels  of  violence.  However,  the  ten-year  trend  within  the  region 
has  shown  a  general  increase  in  military  spending,  for  all  nations,  which  may  indicate 
developing  arms  race,  with  the  potential  of  increased  cross  border  conflicts.  Previous 
studies,  notably  the  Boekestein  study,  have  also  identified  the  significance  of  trade, 
caloric  intake,  and  refugee  migrations  as  conflict  predictor  variables  within  South  and 
East  Asia.  Additionally  improvements  in  overall  quality  of  life,  measured  through  proxy 
variables  such  as  death  rates  and  life  expectancy,  are  positively  correlated  with 
improvements  and  access  to  food  supplies  and  potable  water.  The  covariate  correlations 
for  this  conditional  model  are  provided  in  Figure  27. 
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Figure  27:  South  &  East  Asia  (Not  in  Conflict)  Covariate  Effects 
Sub-Saharan  Africa  -  In  Conflict 

Population  demographics  positively  correlated  to  increases  in  population  density, 
such  as  increases  in  youth  populations,  birth  rates,  and  refugees  appear  to  exacerbate  and 
prolong  existing  conflicts  in  Sub-Saharan  African  nations.  It  also  appears  that 
populations  increased  diversity,  due  to  predominately  tribal  cultures  found  in  these 
nations,  are  more  at  risk  for  violent  conflict  than  those  nations  with  higher  ethnic 
diversity  scores.  Again,  improvements  in  quality  of  life  statistics,  in  this  case  available 
fresh  water  and  life  expectancy,  are  correlated  with  out  of  conflict  transitions.  Over  the 
1 1-year  period  Sub-Saharan  Africa  experience  conflict  in  253  (47%)  of  the  539  observed 
instances.  Government  type  was  identified  as  being  significant  with  this  conditional 
model.  Predominantly,  Sub-Saharan  African  governments  are  categorized  as  either 
emerging  democracies  (23  nations)  or  full  democracies  (22  nations),  with  only  Eritrea 
and  Swaziland  identified  as  having  fully  autocratic  governments  as  of  2014.  Within  this 
region,  emerging  democracies  are  twice  as  likely  to  experience  violent  and  sustained 
conflicts  as  fully  democratic  nations,  most  likely  associated  with  the  inherent  instability 
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of  their  governments.  Figure  28  provides  the  covariate  effects  for  the  Sub-Saharan 
Africa-In  Conflict  model. 


Analysis  of  Effects 
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Figure  28:  Sub-Saharan  Africa  (In  Conflict)  Covariate  Effects 
Sub-Saharan  Africa  -  Not  in  Conflict 

As  shown  in  Figure  29,  and  other  regions,  arable  land  appears  to  be  a  source  of 
instability  and  confounding  factor  for  conflict  transitions.  Historical  and  recent  conflicts 
over  arable  land  have  generally  arisen  due  to  either  actual  or  perceived  scarcity  of  the 
resource,  with  the  general  conclusion  being  that  limited  availability  and  access  to  arable 
land  leads  to  conflict  (Black,  2010).  However,  as  is  other  regional  models,  arable  land  is 
identified  as  being  positively  correlated  to  violent  conflict,  implying  that  increasing  the 
supply  of  this  resource  will  lead  to  increased  levels  of  violence,  which  is  contradictory  to 
previous  studies.  Analysis  of  this  and  other  regional  models  has  shown  that  arable  land  is 
also  positively  correlated  with  such  statistics  as  increased  population  densities,  youth 
populations,  and  increased  number  of  refugees  seeking  asylum,  all  of  which  have 
demonstrated  a  positive  correlation  to  instance  of  violent  conflict  across  the  globe. 
Essentially,  it  appears  that  nations  in  Sub-Saharan  Africa  with  increased  food  production 
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capacities  are  at  a  moderately  greater  risk  of  violent  conflict  associated  with  a 
corresponding  increase  in  their  populations  due  to  procreation  and  migration. 


rvf1  kZi 

c°  A 

/ 

y  #  y  y  y 

Analysis  of  Effects 

Transition  to  Conflict  (Y)  |  1  -0.207  0.279  -0.201  0.121  0.189  0.107  0.095  | 

Source 

Chi-Sq 

Prob>  Chi-Sq 

Freedom  Score 

16.268 

<.0001 

Arable  Land 

15.462 

<.0001 

Trade  (%  GDP) 

12.283 

0.001 

Birth  Rate 

10.673 

0.001 

Infant  Mortality  rate 

9.290 

0.002 

Youth  Bulge 

7.406 

0.007 

Refugee  (Asylum) 

5.137 

0.023 

Figure  29:  Sub-Saharan  Africa  (Not  in  Conflict)  Covariate  Effects 
Summary 

A  total  of  30  variables,  including  the  different  levels  of  the  Government  and 
Regime  type  variables,  were  employed  in  the  construction  of  the  12  conditional  logistic 
regression  models.  Table  28  provides  the  ranking  of  variables  in  terms  of  statistical 
significance  for  each  conditional  model,  with  variables  listed  in  terms  of  overall  world 
view  significance.  Ethnic  diversity,  youth  bulge,  military  expenditure  by  percentage  of 
GDP,  infant  mortality  rate,  and  religious  diversity  were  identified  as  the  five  ordinally 
most  significant  variables  at  the  combined  world  level,  based  upon  their  weighted 
average  rankings.  Studies  conducted  by  the  Peace  Research  Institute  of  Oslo  (PRIO), 
also  found  similar  variables  highly  significant,  lending  credence  to  this  finding  (Urdal, 
2002).  Ethnic  diversity,  which  is  significant  in  5  of  the  12  logistic  regression  models,  and 
is  the  single  most  significant  variable  in  Arab  and  North  African  states,  is  negatively 
correlated  to  nations  transitioning  into  or  remaining  in  conflict.  Additionally,  increased 
youth  populations  which  are  also  significant  in  five  models,  are  positively  correlated  to 
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increased  levels  of  violence.  Of  note,  the  various  levels  of  the  government  and  regime 
type  variables  were  identified  as  statistically  significant  in  6  of  the  12  models.  “In 
Conflict”  models  tend  to  employ  government  or  regime  type  variables  more  frequently 
than  “Not  in  Conflict”  models. 


Table  28:  Ranking  of  Variables  in  terms  of  Model  Statistical  Significance 


Varaible  /  Region 

Arab  &  North  Africa 

Eastern  Europe  &  Central  Asia 

Latin  America 

OECD 

South  &  East  Asia 

Sub-Saharan  Africa 

In  Conflict 

Not  In  Conflict 

In  Conflict 

Not  In  Conflict 

In  Conflict 

Not  In  Conflict 

In  Conflict 

Not  In  Conflict 

In  Conflict 

Not  In  Conflict 

In  Conflict 

Not  In  Conflict 

Ethnic  Diversity 

1 

1 

1 

3 

4 

Youth  Bulge 

3 

1 

4 

2 

6 

Military  Expend  (°/o  GDP) 

6 

1 

4 

2 

Infant  Mortality  rate 

3 

5 

2 

3 

Religious  Diversity 

2 

7 

4 

3 

Trade  (%  GDP) 

1 

5 

3 

Caloric  Intake 

3 

3 

3 

Fresh  Water  per  Capita 

1 

3 

Government  Type  (Foreign  Interruption' 

1 

Population  Growth 

1 

Government  Type  (Democratic) 

1 

2 

7 

Government  Type  (Emerging) 

3 

2 

6 

8 

Freedom  Score 

3 

5 

1 

3  Yr  Freedom  Trend 

3 

2 

Birth  Rate 

7 

6 

6 

4 

Regime  Type  (Democratic) 

2 

4 

Improved  Water 

4 

2 

Refugee  (Origin) 

5 

1 

Arable  Land 

8 

4 

2 

Life  Expectancy 

3 

5 

Fertility  Rate 

2 

6 

Death  Rate 

4 

5 

Military  Expend  (°/o  Gov  Spending) 

2 

9 

Government  Type  (Anarchy) 

3 

Regime  Type  (Emerging) 

4 

Population  density 

4 

GDP  Per  Capitia 

5 

2  Yr  Conflict  Intensity  Trend 

6 

Refugee  (Asylum) 

7 

2  Yr  Freedom  Trend 

9 

4.4  Analysis  of  Nation  Specific  Markov  Models 
Overview 

As  stated  in  Chapter  3,  the  use  of  Markov  models  is  intended  as  an  operationally 
relevant  forecasting  model  of  future  conflict  trends  conditioned  on  whether  a  nation  is  or 
is  not  currently  in  a  state  of  violent  conflict.  Conditional  probabilities  for  each  nation  are 
calculated  using  both  the  “In  Conflict”  and  “Not  in  Conflict”  models  on  the  2014  data  set, 
which  is  the  base  year  for  all  Markov  models.  A  Visual  Basic  (VBA)  based  Markov 
model  tool,  operating  in  the  Microsoft  Excel  environment,  was  developed  to  generate  the 
required  outputs  and  aid  in  the  analysis  of  future  conflict  trends.  In  addition  to 
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calculating  the  conditional  probabilities  for  any  future  year,  this  tool  also  calculates  the 
sojourn  times,  mean  conflict  recurrence  times,  and  long-run  conflict  probabilities  specific 
to  each  of  the  1 82  nations  included  in  this  study. 

Model  Validation 

Analysis  of  Expected  Number  of  States  in  Conflict  for  2014 

As  part  of  a  higher  level  analysis  and  validation  of  the  conditional  conflict 
probabilities,  this  study  compared  the  global  and  regional  incidence  of  violent  conflict 
observed  by  HIIK,  with  the  expected  number  of  nations  in  conflict  determined  using  the 
conditional  probabilities  calculated  by  the  logistic  regression  models  using  2014  conflict 
data,  and  a  0.50  cut  point.  This  analysis  does  not  seek  to  specifically  identify  which 
nations  are  in  conflict  for  a  particular  region,  but  rather  provide  the  expected  incidence 
level  by  region  that  can  be  compared  to  current  global  trends.  A  summary  of  this 
comparison,  by  region,  is  provided  in  Table  29. 

Table  29:  Comparison  of  HIIK  Observed  and  Expected  Incidences  of  Conflict  using 

a  0.50  Cut  Point  for  2014. 


Region 

HIIK  Observed 

States  Not  in 
Conflict 

Expected  Number 
of  States  not  in 

Conflict 

HIIK  Observed 

States  in  Conflict 

Expected  Number 
of  States  in 

Conflict 

Arab  &  North 

African  States 

3 

4.44 

14 

12.56 

Easter  Eurpoe  & 
Central  Asia 

17 

16.57 

11 

11.43 

Latin  America 

13 

14.00 

14 

13.00 

OECD 

26 

25.99 

7 

7.01 

South  &  East  Asia 

15 

13.57 

13 

14.43 

Sub-Saharan  Africa 

24 

27.64 

25 

21.36 

World  View 

98 

102.21 

84 

79.79 
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The  2014  conditional  conflict  probabilities  that  are  subsequently  used  to  develop 
the  nation  specific  Markov  models  for  this  study  predict  approximately  80  nations 
experiencing  violent  conflict  and  102  nations  remaining  out  of  conflict,  and  an  expected 
conflict  incidence  rate  of  43.8%.  The  observed  incidence  of  conflict  for  2014  had  84 
nations  experiencing  some  level  of  violent  conflict,  with  98  nations  remaining  in  a  state 
of  no  conflict,  resulting  in  an  observed  conflict  incidence  rate  of  46.2%.  At  the  regional 
level,  the  absolute  difference  in  the  observed  and  expected  incidence  of  violent  conflict 
was  less  than  1.45  in  five  of  the  six  regions,  and  3.64  in  the  Sub-Saharan  Africa  region. 
The  Arab  and  North  African  States,  followed  by  Latin  America,  and  South  and  East  Asia 
can  be  expected  to  experience  conflict  rates  of  50%  or  greater.  Conversely,  the  conflict 
incidence  rates  for  OECD  nations  are  less  than  half  the  world  average  at  21.3%.  Overall, 
the  conditional  models  provide  a  very  accurate  prediction  of  the  2014  conflict  incidence 
rates  of  each  region,  and  the  world  as  a  whole. 

Forecasting  Validation 

The  validation  of  the  nation  specific  Markov  models  presented  an  interesting 
challenge  due  to  the  inability  to  foresee  all  future  events  with  100  percent  certainty.  As  a 
result,  we  looked  to  the  past  to  develop  a  validation  set  to  compare  against  the  Markov 
models  using  conditional  probabilities  calculated  using  2014  conflict  data.  To  validate 
our  2014  Markov  models,  we  construct  another  set  of  Markov  models  having  conditional 
probabilities  calculated  using  2011  conflict  data;  these  model  are  subsequently  known  as 
the  2011  Markov  Models.  This  set  of  2011  Markov  models  subsequently  forecasts  the 
2014  conflict  probabilities,  which  are  then  compared  to  the  conflict  probabilities 
calculated  using  2014  conflict  data  to  assess  the  level  of  deviation  between  the  two 
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models.  The  2011  year  set  was  selected  for  validation  purposes  due  to  it  containing 
nearly  all  the  data  used  in  the  construction  of  the  logistic  regression  models  and  its 
relatively  recent  timeframe  that  more  closely  resembles  conditions  present  in  the  2014 
operational  environment.  The  purpose  of  this  validation  is  to  ascertain  the  fidelity  of  the 
2014  conditional  conflict  probabilities  by  comparing  their  deviations  from  the  2014 
probabilities  predicted  using  2011  conflict  data.  The  deviation  is  calculated  using 
Equation  27. 

Deviation  (Not  in  Conflict)  =  |2011 P030  -  2014^00 1 
Deviation  (In  Conflict)  =  1 20, ,  P3,  -  2014  ^1°, 

[%  Deviation  (Not  in  Conflict)  +  %  Deviation  (In  Conflict)] 
Average  Deviation  =  - - - - — 

Equation  27:  Markov  Validation 

These  equations  were  applied  to  the  2011  and  2014  Markov  models  for  all  182 
nations  considered  in  this  study.  The  validation  process  then  analyzed  to  statistics  for  the 
entire  set  of  models,  which  are  provided  in  Table  30.  On  average,  the  difference  between 
the  2014  Markov  models  and  the  2011  Models  predicting  2014  was  0.12  with  a  variance 
of  0.016.  Additionally,  a  total  of  152  of  the  182  models  had  average  difference  less  than 
0.25.  Only  the  Ukrainian  model  experiences  deviations  greater  than  0.50  for  both  the 
“Not  in  Conflict”  and  “In  Conflict”  conditional  probabilities;  this  result  is  attributed  to 
the  ongoing  conflicts  in  Crimea  that  significantly  escalated  in  intensity  in  late  2013  and 
early  2014,  and  is  considered  reasonable. 
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Table  30:  Markov  Model  Validation  Statistics 


Category 

Average 

Difference 

B  e  twe  e  n  Mode  is 

Variance 

Number^ 

0.05 

Number^ 

0.10 

Number^ 

0.25 

Number^ 

0.50 

Number  > 

0.50 

Non-Conflict  Deviation 

0.1227 

0.0369 

100 

22 

32 

16 

12 

Conflict  Deviaition 

0.1211 

0.0297 

100 

20 

27 

26 

9 

Average  Model  Deviation 

0.1219 

0.0158 

69 

32 

51 

29 

1 

Additionally,  Markov  model  accuracy  was  assessed  by  comparing  the  2014 
conflict  forecasts  created  by  the  Markov  models  developed  using  2011  data.  In  total  the 
2011  Markov  models  correctly  classified  the  conflict  status  of  155  of  the  182  nations,  for 
a  total  forecast  accuracy  of  85.16%.  Given  the  high  number  of  nation  models  that 
experience  average  deviations  less  than  25%,  and  the  number  of  significant  events  that 
have  occurred  since  2011  (The  Arab  Spring,  the  Rise  of  the  Islamic  State,  Crimean 
conflict,  etc.),  the  2014  Markov  models  appear  as  valid  representations  of  current  conflict 
transition  probabilities. 

Analysis  of  HIIK  Conflict  Intensity  Levels  and  Conflict  Probability 

As  part  of  the  model  validation  process,  this  study  analyzed  the  conditional 
conflict  probabilities  as  they  relate  to  the  HIIK  levels  of  violence.  The  theory  behind  this 
analysis  is  that  there  should  exist  a  strongly  positive  correlation  between  a  nation’s 
conditional  probability  of  conflict  and  its  level  of  violence  in  2014.  As  part  of  this 
analysis,  the  HIIK  levels  of  violence  were  mapped  to  the  corresponding  ranges  of 
probabilities  shown  in  Table  31,  with  the  assumption  that  the  HIIK  levels  of  violence  are 
linear  and  well  scaled. 
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Table  31:  HIIK  Intensity  Bin  Assignments 


HIIK  Intesity  Level  Bin  Assignments 

HIIK  Bin 

>  Lower  Bound 

<  Upper  Bound 

0 

0.000 

0.167 

1 

0.167 

0.333 

2 

0.333 

0.500 

3 

0.500 

0.667 

4 

0.667 

0.833 

5 

0.833 

1.000 

Nations  are  then  assigned  to  a  HIIK  bin  based  upon  their  assigned  conditional 


conflict  probability.  The  average  HIIK  score,  based  upon  the  nations’  actual  conflict 


intensity  for  2014,  is  then  calculated  for  each  bin  as  shown  in  Figure  30. 


Average  2014  HIIK  Score  per  Bin 


HIIK  Conflict  Intensity  Bin  Level 


Figure  30:  Average  HIIK  Conflict  Intensity  Levels  by  Bin 

As  can  be  seen,  the  average  HIIK  score  is  positively  correlated  with  its  bin 
assignment;  with  a  calculated  correlation  of  0.731.  However  the  average  HIIK  score 
does  not  strictly  increase  over  the  entire  bin  range,  noted  by  the  decrease  from  Bin  2  to 
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Bin  3.  The  decrease  in  the  average  HIIK  score  for  Bin  3  may  be  the  result  of  an  outlier(s) 
that  may  significantly  decrease  the  average  score  within  the  bin.  Such  points  would  be 
significant  false-positives  or  false-negatives;  nations  assigned  either  a  very  low  or  very 
high  conditional  conflict  probability  in  respect  to  its  actual  level  of  violence.  Figure  31 
provides  a  visual  depiction  of  bin  assignments  versus  conflict  intensity  for  2014. 


Conditional  Conflict  Probability  vs.  2014  HIIK  Intensity  Level 


Conflict  Probability  (Bin  Ranges) 


Figure  31:  Identification  of  Significant  Outliers  by  HIIK  Bin 

A  total  of  eight  possible  significant  outliers  were  initially  identified,  based  upon 
having  an  absolute  deviation  in  the  HIIK  conflict  intensities  and  assigned  bins  greater 
than  or  equal  to  two  (with  the  exception  points  having  a  HIIK  level  of  3);  these  point  are 
marked  by  the  circles  in  Figure  31.  The  identified  outliers  consist  of:  Libya  (Bin  0), 
Egypt  (Bin  2),  Cameroon  (Bin  3),  Panama  (Bin  4),  Kiribati  (Bin  5),  Qatar  (Bin  5),  Oman 
(Bin  5),  and  the  United  Arab  Emirates  (Bin  5).  A  fonnal  outlier  analysis  was  conducted 
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to  verify  these  outliers,  and  identify  other  potential  outliers  within  the  data.  Outliers  were 
identified  through  examination  of  the  scaled  “R-studentized”  residuals,  a  method 
commonly  used  in  linear  regression  to  identify  extreme  points  that  are  considerably 
different  from  a  majority  of  the  data  (Montgomery,  Peck,  &  Vinning,  2012).  This 
process  identified  the  nine  nations  listed  in  Table  32  as  being  possible  significant  outliers 
in  their  respective  models  if  they  fail  to  transition  from  their  2014  conflict  status  by  2015. 

Table  32:  Significant  Outliers,  2014 


Nation 

HIIK  Bin 

Conflict  Probability 

HIIK  Intensity  Level 

Gabon 

0 

0.004 

3 

Kyrgyzstan 

0 

0.008 

3 

Tunisia 

0 

0.009 

3 

Vietnam 

0 

0.100 

3 

Libya 

0 

0.000 

5 

Kiribati 

5 

0.863 

0 

Qatar 

5 

1.000 

0 

Oman 

5 

1.000 

1 

United  Arab  Emirates 

5 

1.000 

1 

A  majority  of  the  these  outlier  nations  are  from  the  Arab  &  North  African  States 
region,  with  Tunisia  and  Libya  identified  as  possible  significant  false-negative 
classifications;  and  Qatar,  Oman,  and  the  United  Arab  Emirates  (UAE)  identified  as 
possible  significant  false-positive  classifications.  This  result  further  highlights  the 
extreme  instability  within  the  region  and  the  effects  of  the  Arab  Spring,  hindering  conflict 
transition  analysis.  Removal  of  these  outliers  results  in  the  plot  provided  in  Figure  32. 
Comparison  of  this  plot  with  that  shown  in  Figure  30,  shows  an  improvement  in  the 
overall  linearity  of  the  plot,  and  the  expected  positive  correlation  associated  the  average 
conflict  intensity  and  H1IK  bin  level.  Such  a  result  indicates  that  we  have  identified  all 
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significant  outlier  nations  for  2014.  Ultimately  the  objective  of  this  analysis  is  the 


identification,  as  opposed  to  the  removal,  of  possible  significant  misclassified  nations  for 


the  purpose  of  monitoring  both  the  Markov  model  outputs  and  future  conflict  status  for 


consistency  and  accuracy. 


Average  2014  HIIK  Score  per  Bin  (Outliers  Removed) 
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0 
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0.941 

1 

0.333 

1.364 

2 

0.500 

2.600 

3 

0.667 

3.000 

4 

0.833 

3.250 

5 

1.000 

3.672 

2  3 

HDK  Conflict  Intensity  Bin  Level 


Figure  32:  Average  HIIK  Conflict  Intensity  Levels  with  Outliers  Removed 
Overall  Assessment  of  Markov  Models 

Analysis  of  overall  suitability  and  validity  of  the  nation-specific  Markov  models 
has  demonstrated  that  the  tool  functions  properly  and  provides  accurate  calculations 
based  on  the  logistic  regression  model  inputs.  The  validity  of  the  2014  logistic  regression 
model  inputs  was  verified  through  comparison  of  2011  conditional  probabilities 
predicting  the  2014  conflict  transition  probabilities.  This  comparison  yielded  satisfactory 
results  with  83.5%  of  all  nations  experiencing  absolute  deviations  in  respective  conflict 
probabilities  less  than  0.25.  The  conditional  probabilities  were  then  compared  with  the 
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HIIK  conflict  intensity  levels,  identifying  a  positive  correlation  associated  with  conflict 
probability  and  a  nation’s  observed  conflict  intensity  for  2014.  As  part  of  this 
comparison,  analysis  of  significantly  misclassified  outliers  identified  only  nine  nations 
whose  models  may  produce  faulty  or  inaccurate  forecasts,  based  on  2014  predictions,  and 
may  require  further  refinement  in  future  studies.  Finally,  comparison  of  the  expected  and 
observed  regional  incidences  of  conflict  indicated  a  high  level  of  accuracy  in  the  models’ 
ability  predict  regional  levels  of  violence,  further  substantiating  the  suitability  of  the 
models  for  forecasting  future  conflict  trends. 

Key  Markov  Model  Outputs 

The  objective  of  the  nation-specific  Markov  models  is  to  provide  operationally 
relevant  insights  on  future  conflict  trends.  In  addition  to  conflict  forecasts,  which  will  be 
discussed  at  length  in  the  following  section,  this  study  also  seeks  to  determine  the  sojourn 
times,  long-run  conflict  probabilities,  and  mean  conflict  recurrence  times  for  each  nation. 
It  should  be  understood  that  these  calculations  are  predicated  on  the  assumption  that 
current  conditions  regarding  the  2014  independent  conflict  variable  remain  unchanged 
within  each  region,  and  that  the  forecasted  trends  may  be  altered  through  the  application 
of  national  power,  Black  Swan  events  (Taleb,  2010),  or  both.  The  complete  table  of 
Markov  model  outputs  is  provided  in  Appendix  C. 

Sojourn  times  E[Rj ]  are  simply  time  expected  for  a  nation’s  nth  conflict  transition. 
For  this  study,  we  examine  the  first  and  second  sojourn  times,  and  their  respective 
variances,  for  each  nation  beginning  in  the  base  year  2014.  An  example  of  first  and 
second  Sojourn  times,  as  well  as  the  2014  Markov  model,  is  provided  in  Figure  33. 
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Conflict  Tranistion  Probability  Markov  Chain  Tool 

Sojourn  Times  (R;) 


Number  of  Years  into  Future  = 

4 

Country 

Angola 

Year 

2014 

No  Conflict  Conflict 

Current  Status: 

Number  of  Years  to  1st  transition: 
Variance 

No  Conflict 

4.21 

13.50 

Status: 

No  Conflict 

No  Conflict 
Conflict 

0.7623733 

0.0571556 

0.237627 

0.942844 

Number  years  to  2nd  Transition 
Variance 

21.70 

302.12 

Figure  33:  Example  of  First  and  Second  Sojourn  Times 

In  this  example,  Angola’s  Markov  2014  model  indicates  that  the  nation  is  more 
likely  to  be  in  a  state  of  conflict,  due  to  the  highly  likelihood  (94%)  that  once  Angola 
enters  into  a  state  of  conflict,  it  will  remain  in  that  state  the  following  year.  This 
tendency  is  subsequently  reflected  in  Angola’s  sojourn  times.  Given  that  Angola  was  in 
a  state  of  non-conflict  in  2014,  it  is  calculated  that  Angola  will  experience  its  first 
transition  into  conflict  in  approximately  4.21  years  with  a  standard  deviation  of 
approximately  4  years.  It  is  therefore  likely  that  Angola  will  transition  into  conflict 
within  the  next  8  years.  However,  as  stated  previously,  once  Angola  enters  into  a  state  of 
conflict  it  is  predicted  to  remain  in  that  state  for  approximately  18  years.  The  second 
sojourn  time,  in  this  case  the  time  for  Angola  to  transition  back  into  a  state  of  non¬ 
conflict,  is  simply  the  sum  of  its  first  sojourn  time  and  its  expected  time  to  remain  in 
conflict,  and  is  calculated  to  be  approximately  21.7  years  from  2014,  with  a  standard 
deviation  in  the  expected  second  sojourn  time  of  approximately  17  years.  The  increased 
variance  associated  with  this  standard  deviation  can  subsequently  be  equated  to  an  higher 
levels  of  risk,  in  terms  of  model  accuracy,  due  to  a  prolonged  prediction  horizon. 
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While  Angola’s  sojourn  times  are  representative  of  many  of  the  nations  included 
in  this  study,  numerous  nations  have  predicted  sojourn  times  that  span  hundreds  if  not 
thousands  of  years.  This  phenomena,  is  due  to  nations  having  an  overwhelming 
tendency,  as  of  2014,  to  remain  in  one  state  or  another.  Figure  34  provides  an  example  of 
significantly  long  sojourn  times  for  Canada,  which  is  primarily  in  a  state  of  non-conflict, 
and  the  Central  African  Republic,  which  is  predicted  to  spend  an  vast  amount  of  time  in  a 
state  of  conflict. 


Conflict  Tranistion  Probability  Markov  Chain  Tool 

Sojourn  Times  (RJ 

Number  of  Years  into  Future  = 

31 

Country' 

Year 

2014 

Current  Status 

No  Conflict 

Canada 

Number  of  Years  to  1st  transition: 

3.39E+05 

No  Conflict  Conflict 

Variance 

1.15E+11 

Status: 

No  Conflict 

No  Conflict 

0.9099971  2.95E-06 

Number  years  to  2nd  Transition 

3.39E+05 

Conflict 

0.0507637  0.949236 

Variance 

1.15E+11 

32 

Country' 

Year 

2014 

Current  Status: 

Conflict 

Central  African  Republic 

Number  of  Years  to  1st  transition: 

4.55E+11 

No  Conflict  Conflict 

Variance 

2.07E+23 

Status: 

Conflict 

No  Conflict 

0.0285233  0.971477 

Number  years  to  2nd  Transition 

4.55E+11 

Conflict 

2.19SE-12  1 

Variance 

2.07E+23 

Figure  34:  Example  of  Significantly  Long  Sojourn  Times 

As  can  be  seen,  the  expected  time  for  Canada  to  transition  into  a  state  of  conflict 
is  approximately  339,000  years,  indicating  a  significant  preference  towards  non  conflict. 
Similarly  the  Central  African  Republic  shows  an  even  greater  predilection  to  remain  in 
state  of  conflict  based  on  the  2014  model.  The  significantly  large  variances,  for  these  and 
similar  nations,  are  functions  of  the  extreme  time  horizons  associated  with  their  sojourn 
times  and  indicate  that  a  transition  can  occur  any  time  within  the  forecast  window.  The 
operational  relevance  of  these  significantly  long  sojourn  times  is  the  insight  that  certain 
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nations  are  not  expected  to  experience  a  conflict  transition  within  the  foreseeable  future  if 
2014  conditions  remain  unchanged. 

The  long-run  proportion  {jtj)  of  time  a  nation  spends  either  in  a  state  of  conflict 
or  non-conflict  is  an  indicator  of  the  transience,  the  tendency  to  transition  in  or  out  of 
conflict,  of  a  nation.  Again  the  operational  relevance  of  this  statistic  is  the  identification 
of  nations  that  either  tend  to  be  in  one  state  or  the  other,  as  well  as  nations  that  have  the 
experience  frequent  conflict  transitions.  In  total,  95  nations  have  long  run  probabilities 
that  indicate  a  tendency  for  violent  conflict,  while  87  nations  have  long  run  probabilities 
that  indicate  a  predisposition  for  non-conflict.  An  example  of  these  three  categories  of 
long-run  conflict  probabilities  is  presented  in  Figure  35. 


Conflict  Tranistion  Probability  Markov  Chain  Tool 

Number  of  Years  into  F uture  =  1 

Long  Run  Conflict  Probabilites  (Ttj) 

35 

Country 

Year 

2014 

China 

No  Conflict  Conflict 

Probability  Not  in  Conflict 

Probability  in  Conflict 

Status: 

Conflict 

No  Conflict 

0.9112038  0.088796 

0.002596693 

0.997403307 

Conflict 

0.0002312  0.999769 

35 

Country 

Year 

2014 

Colombia 

No  Conflict  Conflict 

Probability  Not  in  Conflict 

Probability  in  Conflict 

Status: 

Conflict 

No  Conflict 

0.9001821  0.099818 

0.528059099 

0.471940901 

Conflict 

0.1116872  O.SSS313 

37 

Country 

Year 

2014 

Comoros 

No  Conflict  Conflict 

Probability  Not  in  Conflict 

Probability  in  Conflict 

Status: 

No  Conflict 

No  Conflict 

0.9808778  0.019122 

0.859574436 

0.140425564 

Conflict 

0.1170507  0.882949 

Figure  35:  Long  Run  Conflict  Probabilities 

As  can  be  seen,  China’s  long-run  probability  indicates  that  China  is  expected  to 
be  in  conflict  99.7%  of  the  time,  a  nearly  permanent  state  of  conflict,  that  is  reinforced  by 
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its  violent  history  and  ongoing  internal  conflicts.  To  a  lesser  extent,  Comoros  is  expected 
to  be  in  a  state  of  non-conflict  nearly  86%  of  the  time,  indicating  that  the  nation  has  some 
region-specific  conflict  risk  factors  but  possesses  a  level  of  stability  that  limits  the  overall 
incidence  of  violent  conflict.  Columbia,  however,  has  long-run  conflict  probabilities  that 
predict  the  nation  will  spend  nearly  equal  amounts  of  time  in  and  out  of  conflict,  equating 
to  a  high  conflict  transience  rate.  Transience,  which  will  be  discussed  in  depth  in  the 
following  section,  may  be  an  indicator  of  a  nation’s  susceptibility  to  both  internal  and/or 
external  forces  resulting  in  a  nation’s  transition  from  one  conflict  status  to  the  other. 

Long-run  conflict  probabilities  can  be  translated  into  the  mean  conflict  status 
recurrence  {rrij ) ,  or  the  average  number  of  steps  a  nation  requires  to  return  to  its  current 
state.  As  shown  in  Equation  24,  the  mean  recurrence  time  is  calculated  by  simply  taking 
the  inverse  of  the  long-run  conflict  probability.  While  similar  to  sojourn  time,  the  mean 
recurrence  is  the  long  run  average  of  conflict  transition  steps,  and  it  represents  the 
predicted  number  of  steps  a  nation  can  expect  to  experience  in  order  to  return  to  either  a 
state  of  conflict  or  non-conflict.  Figure  36  provides  the  mean  recurrence  steps  that 
correspond  to  the  long-run  probabilities  for  China,  Columbia  and  Comoros. 
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Conflict  Tranistion  Probability  Markov  Chain  Tool 

Number  of  Years  into  Future  =  1 

Mean  Conflict  Recurrence  (MO 

35 
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Year 

2014 

China 

No  Conflict  Conflict 

Expected  Steps  to  Transition  from  No  Conflict  to  No  Con 

385.11 

Status: 

Conflict 

No  Conflict 

0.9112038  0.088796 

Expected  Steps  to  Transition  from  Conflict  to  Conflict 

1.00 

Conflict 

0.0002312  0.999769 
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No  Conflict  Conflict 

Expected  Steps  to  Transition  from  to 

1.89 

Status: 

Conflict 

No  Conflict 

0.9001821  0.099818 

Expected  Steps  to  Transition  from  Conflict  to  Conflict 

2.12 

Conflict 

0.1116872  0.888313 
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2014 

Comoros 

No  Conflict  Conflict 

Expected  Steps  to  Transition  from  No  Conflict  to  No  Con 

1.16 

Status: 

No  Conflict 

No  Conflict 

0.9808778  0.019122 

Expected  Steps  to  Transition  from  Conflict  to  Conflict 

7.12 

Conflict 

0.1170507  0.882949 

Figure  36:  Example  of  Mean  Recurrence  Times 

Corresponding  to  an  overwhelming  probability  of  remaining  in  conflict,  the 
predicted  non-conflict  recurrence  (m0),  in  China  is  approximately  385  steps,  equating  an 
extremely  low  transience  rate.  However,  when  China  does  enter  into  a  state  of  non- 
conflict,  it  is  expected  that  the  nation  will  transition  back  into  conflict  within  a  year.  As 
stated  earlier,  Columbia  is  predicted  to  spend  nearly  equal  amounts  of  time,  over  the 
long-run,  in  states  of  conflict  and  non-conflict.  This  transient  tendency  equates  to 
recurrence  rates,  for  both  conflict  and  non-conflict,  of  approximately  two  steps.  Given, 
this  prediction,  Columbia  could  theoretically  experience  up  to  2.5  conflict  recurrences 
every  10  model  steps,  possibly  resulting  in  severe  and  recurrent  instability  within  the 
region. 
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4.5  Forecasting  Global  Conflict  trends 
Overview 

The  use  of  nation  specific  Markov  models  enables  forecasting  of  conflict  for  time 
horizons  far  greater  than  those  possible  with  logistic  regression  alone.  For  this  study  we 
examine  the  predicted  incidences  of  violent  conflict  for  2016,  2019,  and  2024,  identifying 
which  nations  are  predicted  to  experience  significant  changes  in  their  conflict 
probabilities.  As  part  of  this  analysis,  we  will  then  examine  the  predicted  individual 
transience  of  each  nation  over  this  ten  year  forecasting  period,  identifying  which  nations 
are  predicted  to  experience  a  conflict  transition  rate  above  regional  and  world  averages. 
It  should  be  remembered  that  the  forecasts  provided  in  this  study  are  predicated  on  the 
assumption  that  regional  factors  gennane  to  violent  conflict  remain  unchanged  from 
current  conditions  throughout  the  forecast  period. 

Two,  Five,  and  Ten  Year  Conflict  Forecasts 

World  Overview 

The  two-,  five-  and  ten-  year  conflict  forecasts  for  each  nation  were  calculated  by 
raising  their  specific  Markov  models,  using  2014  conflict  transition  probabilities,  to  the 
2nd,  5th,  and  10th  powers.  The  analysis  focused  on  determining  the  incidence  of  conflict  at 
the  regional  and  world  levels  by  identify  which  states  had  a  probability  of  greater  than  or 
equal  to  0.50.  Additionally,  the  analysis  also  identified  the  ten-year  conflict  trends  for 
each  nation  by  calculating  the  difference  in  the  2014  and  2024  conflict  probabilities.  The 
analysis  sought  to  identify  which  nations  experienced  significant,  moderate,  or  slight 
changes  in  the  probability  of  conflict;  Table  33  provides  the  assessment  of  the  change  in 
conflict  over  the  range  of  probabilities.  Negative  changes  in  conflict  probability  equate 
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to  a  predicted  decrease  in  the  level  of  violence  over  ten  year  period,  while  positive  equate 
to  an  increase  in  violence  over  the  same  time  span.  Additionally,  nations  that  experience 
an  absolute  change  in  conflict  probability  less  than  or  equal  to  0.05  are  assessed  as  having 
no  significant  change  in  their  conflict  status  over  the  ten  year  time  horizon. 

Table  33:  Forecasting  Assessment  Matrix 


Change  in  P[Conflict]: 
2014  to  2024 

Assessment  of  Change  in 
Conflict 

AP  <  -0.50 

AP  <  -0.25 

AP  <  -0.05 

Significantly  Less  Conflict 
Moderately  Less  Conflict 
Slightly  Less  Conflict 

-0.05  <  AP  <  0.05 

No  Change 

AP  >  0.05 

AP  >  0.25 

AP  >  0.50 

Slightly  More  Conflict 
Moderately  More  Conflict 
Significantly  More  Conflict 

A  total  of  17  nations  were  identified  as  having  significant  changes  in  their 
probabilities  of  conflict  over  the  ten-year  forecast  period,  and  are  presented  in  Table  34. 
Twelve  of  these  nations  are  projected  to  experience  significantly  more  conflict  by  2024, 
while  only  five  nations  are  expected  to  realize  significant  decreases  in  their  levels  of 
violence  over  the  same  time  frame.  In  total,  40  of  the  182  nations  considered  in  this 
study  are  predicted  to  experience  increases  in  conflict  over  the  ten-year  forecast  period. 
Additionally,  30  nations  are  expected  to  realize  net  decreases  in  conflict,  with  112  nations 
experience  no  significant  change  in  their  current  conflict  levels  over  the  same  period. 
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Table  34:  Significant  Changes  in  Conflict  Probability  Over  10  Year  Period 


Country 

Region 

10  Year  Trend 

Libya 

Arab  Countries 

Significantly  More  Conflict 

Tunisia 

Arab  Countries 

Significantly  More  Conflict 

Kazakhstan 

Eastern  Europe  and  Central  Asia 

Significantly  More  Conflict 

Romania 

Eastern  Europe  and  Central  Asia 

Significantly  More  Conflict 

Trinidad  and  Tobago 

Latin  America 

Significantly  More  Conflict 

Korea,  North 

South  and  East  Asia 

Significantly  More  Conflict 

Micronesia,  Federated  States  of 

South  and  East  Asia 

Significantly  More  Conflict 

Mongolia 

South  and  East  Asia 

Significantly  More  Conflict 

Nepal 

South  and  East  Asia 

Significantly  More  Conflict 

Timor-Leste 

South  and  East  Asia 

Significantly  More  Conflict 

Angola 

Sub  Saharan  Africa 

Significantly  More  Conflict 

Sierra  Leone 

Sub  Saharan  Africa 

Significantly  More  Conflict 

Honduras 

Latin  America 

Significantly  Less  Conflict 

Paraguay 

Latin  America 

Significantly  Less  Conflict 

Greece 

OECD 

Significantly  Less  Conflict 

Cambodia 

South  and  East  Asia 

Significantly  Less  Conflict 

Burundi 

Sub  Saharan  Africa 

Significantly  Less  Conflict 

Table  35  provides  a  summary  of  the  global  incidence  of  conflict  and  ten  year 
conflict  trends.  As  of  2014,  84  of  the  182  nations  considered  in  the  study  were  observed 
to  be  in  violent  conflict,  and  it  is  predicted  that  this  global  incidence  of  conflict  will 
remain  constant  over  the  10  year  period.  However,  over  the  same  time  frame,  it  is 
predicted  that  40  of  nations  will  experience  increased  probabilities  of  conflict,  while  only 
30  nations  will  realize  decreases  in  their  respective  conflict  probabilities.  However,  it 
should  be  noted  that  changes  in  conflict  probabilities  do  not  necessarily  equate  to  conflict 
transitions  but  instead  identify  nations  that  are  expected  to  experience  a  measurable 
change  in  their  current  levels  of  violence.  Analysis  of  the  long  tenn  conflict 
probabilities,  based  on  2014  data,  indicates  that  the  global  incidence  of  violence  is 
expected  to  increase,  with  a  projected  95  (52%)  of  the  182  nations  existing  in  a  state  of 
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violent  conflict.  The  complete  two-,  five-,  and  ten-year  forecasts  for  each  nation  are 
provided  in  Appendix  D. 

Table  35:  Summary  of  Conflict  Forecasts:  World  View 


World  View  Conflict  Trend  Statistics 

Statistic 

Count 

Percentage 

Total  Nations  Considered: 

182 

100% 

Total  Nations  in  Conflict  2014 

84 

46% 

Projections 

Number  Projected  in  Conflict  2016 

79 

43% 

Number  Projected  in  Conflict  2019 

80 

44% 

Number  Projected  in  Conflict  2024 

84 

46% 

Likelihood  Trends  2014 

-2024 

Number  trending  towards  conflict 

40 

22% 

Number  trending  towards  non-conflict 

30 

16% 

Number  experiencing  no  change 

112 

62% 

Arab  &  North  African  States 

The  Arab  and  North  African  States  currently  experience  the  highest  rates  of 
violent  conflict  among  the  six  geographic  regions,  with  14  of  17  states  experiencing 
violent  conflict  as  of  2014,  as  shown  in  Table  36.  These  levels  of  violence  are  project  to 
increase  over  the  ten-year  forecast,  with  a  projected  regional  violent  conflict  rate  of  100% 
by  year  2024,  given  no  change  in  current  conditions.  These  regional  conflict  rates  are 
expected  to  continue  indefinitely  past  the  ten-  year  forecast  horizon.  It  should  be  noted 
that  the  conflict  rates  within  this  region  are  predicted  to  cycle  between  14  and  17  nations 
during  the  forecast  period.  This  cycling  is  the  result  of  predicted  state  transitions  by 
Egypt,  Jordan,  Libya,  Morocco,  and  Tunisia  during  the  forecast  period.  Over  the  long 
run,  it  is  projected  that  this  cycling  will  cease,  and  that  all  17  nations  within  the  region 
will  be  in  a  state  of  conflict  given  no  change  to  current  conditions. 
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Table  36:  Arab  &  North  African  States  Conflict  Forecast  Summary 


Arab  &  North  African  States  Conflict  Trend  Statistics 

Statistic 

Count 

Percentage 

Total  Nations  in  Region 

17 

100% 

Total  Nations  in  Conflict  2014 

14 

82% 

Projections 

Number  Projected  in  Conflict  2016 

17 

100% 

Number  Projected  in  Conflict  2019 

14 

82% 

Number  Projected  in  Conflict  2024 

17 

100% 

Likelihood  Trends  2014 

-2024 

Number  trending  towards  conflict 

6 

35% 

Number  trending  towards  non-conflict 

0 

0% 

Number  experiencing  no  change 

11 

65% 

Eastern  Europe  &  Central  Asia 

The  Eastern  Europe  and  Central  Asia  region  is  projected  to  steady  growth  in  its 
rate  of  violent  conflict  over  the  ten  year  forecasting  period,  with  a  projected  conflict 
incidence  rate  of  46%  by  year  2024.  Long  run  conflict  rates  are  expected  to  peak  at  61%, 
with  17  of  the  28  existing  in  a  state  of  conflict.  Within  the  region  violent  conflict  is 
expected  to  cluster  in  the  Caucasus  and  the  states  bordering  Afghanistan,  while  many  of 
the  eastern  European  and  Baltic  nations  are  predicted  to  remain  out  of  conflict  over  the 
same  period.  Internecine  violence  within  Russia  and  Ukraine  is  predicted  to  continue 
unabated  over  the  next  decade,  and  may  lead  to  increased  instability  within  the 
surrounding  former  Soviet  states.  The  regional  summary  is  provided  in  Table  37. 
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Table  37:  Eastern  Europe  &  Central  Asia  Conflict  Forecast  Summary 


Eastern  Europe  &  Central  Asia  Conflict  Trend  Statistics 

Statistic 

Count 

Percentage 

Total  Nations  in  Region 

28 

100% 

Total  Nations  in  Conflict  2014 

11 

39% 

Projections 

Number  Projected  in  Conflict  2016 

9 

32% 

Number  Projected  in  Conflict  2019 

11 

39% 

Number  Projected  in  Conflict  2024 

13 

46% 

Likelihood  Trends  2014 

-2024 

Number  trending  towards  conflict 

8 

29% 

Number  trending  towards  non-conflict 

3 

11% 

Number  experiencing  no  change 

17 

61% 

Latin  America 

Latin  American  violent  conflict  rates  are  predicted  to  remain  constant  over  the 
forecasting  period  with  13  of  the  27  nations  predicted  experience  some  level  of  violent 
conflict.  Violent  conflict  is  predicted  to  cluster  in  South  and  Central  American  nations, 
while  only  two  Caribbean  nations  (Jamaica  and  Trinidad)  are  projected  to  be  in  state  of 
conflict  by  2024.  The  forecast  also  predicts  that  Brazil,  Columbia,  and  Venezuela  will 
remain  in  conflict  with  levels  of  violence  remaining  constant  in  Brazil  and  Venezuela. 
Columbia,  on  the  other  hand,  is  projected  to  experience  a  moderate  decrease  in  it  conflict 
probability  by  2024,  given  current  conditions  persist.  Over  the  long  run,  conflict  rates  are 
expected  to  increase  to  approximately  70%,  with  19  of  the  27  nations  predicted  to  be  in  a 
state  of  violent  conflict,  a  majority  of  which  are  located  in  Central  America  and  norther 
South  America.  The  regional  summary  is  provided  in  Table  38. 
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Table  38:  Latin  America  Conflict  Forecast  Summary 


Latin  America  Conflict  Trend  Statistics 

Statistic 

Count 

Percentage 

Total  Nations  in  Region 

27 

100% 

Total  Nations  in  Conflict  2014 

14 

52% 

Projections 

Number  Projected  in  Conflict  2016 

13 

48% 

Number  Projected  in  Conflict  2019 

13 

48% 

Number  Projected  in  Conflict  2024 

13 

48% 

Likelihood  Trends  2014 

-2024 

Number  trending  towards  conflict 

6 

22% 

Number  trending  towards  non-conflict 

4 

15% 

Number  experiencing  no  change 

17 

63% 

OECD 

Currently  the  OECD  region  experiences  the  lowest  rates  of  violent  conflict  among 
the  six  geographic  regions,  a  trend  that  is  currently  in  a  state  of  equilibrium,  and  is 
expected  to  continue  over  the  forecast  period.  Of  the  six  OECD  nations  predicted  to  be 
in  conflict  in  2024,  only  South  Korea  is  predicted  to  experience  a  transition  into  conflict, 
while  Chile,  Israel,  Mexico,  Turkey,  and  the  United  Kingdom  are  project  to  remain  in 
conflict  for  the  foreseeable  future.  It  is  also  predicted  that  only  Poland  and  the  United 
States  are  predicted  to  experience  slight  increases  in  their  respective  conflict 
probabilities,  while  all  other  nations  will  realize  either  a  decrease  or  no  significant  change 
in  the  conflict  probabilities  over  the  next  decade.  Long  run  incidences  of  conflict  are 
expected  to  drop  to  15%,  with  the  nations  of  Chile,  Israel,  Mexico,  South  Korea,  and 
Turkey  remaining  in  a  state  of  conflict.  The  regional  summary  is  provided  in  Table  39. 
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Table  39:  OECD  Conflict  Forecast  Summary 


OECD  Conflict  Trend  Statistics 

Statistic 

Count 

Percentage 

Total  Nations  in  Region 

33 

100% 

Total  Nations  in  Conflict  2014 

7 

21% 

Projections 

Number  Projected  in  Conflict  2016 

6 

18% 

Number  Projected  in  Conflict  2019 

6 

18% 

Number  Projected  in  Conflict  2024 

6 

18% 

Likelihood  Trends  2014 

-2024 

Number  trending  towards  conflict 

2 

6% 

Number  trending  towards  non-conflict 

4 

12% 

Number  experiencing  no  change 

27 

82% 

South  &  East  Asia 

Rates  of  violent  conflict  in  the  South  and  East  Asian  region  are  projected  to 
eclipse  those  of  the  both  Latin  America  and  Sub-Saharan  Africa  regions,  with  17  of  the 
28  regional  nations  predicted  to  be  in  a  state  of  conflict  by  2024.  Over  the  forecast  period 
six  nations  (Laos,  Micronesia,  Mongolia,  Nepal,  North  Korea,  and  Timor-Leste)  are 
predicted  to  experience  transitions  into  conflict,  while  only  two  nations  (Cambodia  and 
Vietnam)  are  predicted  to  transition  out  of  conflict.  Both  China  and  India  are  predicted  to 
remain  in  conflict  over  the  next  decade,  with  their  respective  conflict  probabilities 
remaining  nearly  constant  over  the  same  period.  As  shown  in  Table  40,  conflict  cycles 
over  the  course  of  the  forecast  period  due  to  the  transitions  discussed  previously. 
Ultimately  the  incidence  rate  of  violent  conflict  is  predicted  to  stabilize  at  64%  with  1 8  of 
the  28  nations  experiencing  some  level  of  violent  conflict. 
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Table  40:  South  &  East  Asia  Conflict  Forecast  Summary 


South  &  East  Asia  Conflict  Trend  Statistics 

Statistic 

Count 

Percentage 

Total  Nations  in  Region 

28 

100% 

Total  Nations  in  Conflict  2014 

13 

46% 

Projections 

Number  Projected  in  Conflict  2016 

14 

50% 

Number  Projected  in  Conflict  2019 

18 

64% 

Number  Projected  in  Conflict  2024 

17 

61% 

Likelihood  Trends  2014 

-2024 

Number  trending  towards  conflict 

8 

29% 

Number  trending  towards  non-conflict 

4 

14% 

Number  experiencing  no  change 

16 

57% 

Sub-Saharan  Africa 

While  the  Sub-Saharan  Africa  region  has  the  most  states  currently  and  predicted 
to  b  in  violent  conflict,  it  is  the  only  region,  other  than  the  OECD,  that  is  projected  to 
experience  a  decrease  in  its  regional  rate  of  conflict  over  the  ten  year  forecast.  Of  the  1 8 
nations  predicted  to  be  in  conflict  in  2024,  only  Angola,  Cameroon,  and  Sierra  Leone  are 
predicted  to  transition  into  conflict;  additionally  10  nations  are  projected  to  transition  out 
of  conflict  over  the  same  period.  Similar  to  the  OECD  region,  average  conflict 
probabilities  are  projected  to  decrease  in  Sub-Saharan  Africa  over  the  next  decade  with 
15  nations  projected  to  have  lower  probabilities  of  conflict,  while  only  10  nations  are 
predicted  to  have  increased  conflict  probabilities  given  current  conditions.  Ove  the  long 
run,  conflict  rates  in  Sub-Saharan  Africa  are  predicted  to  stabilize  at  39%  with  19  of  the 
49  nations  existing  in  a  state  of  violent  conflict.  The  regional  summary  is  provided  in 
Table  41. 
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Table  41:  Sub-Saharan  Africa  Conflict  Forecast  Summary 


Sub-Sahara  Africa  Conflict  Trend  Statistics 

Statistic 

Count 

Percentage 

Total  Nations  in  Region 

49 

100% 

Total  Nations  in  Conflict  2014 

25 

51% 

Projections 

Number  Projected  in  Conflict  2016 

20 

41% 

Number  Projected  in  Conflict  2019 

18 

37% 

Number  Projected  in  Conflict  2024 

18 

37% 

Likelihood  Trends  2014 

-2024 

Number  trending  towards  conflict 

10 

20% 

Number  trending  towards  non-conflict 

15 

31% 

Number  experiencing  no  change 

24 

49% 

Analysis  of  Conflict  Transience  in  Nations 

Conflict  transience  describes  a  nation’s  tendency  to  transition  into  and  out  of 
conflict  frequently.  Highly  transient  nations,  such  as  Columbia,  Morocco,  or  the  United 
States,  are  identified  as  those  having  long-run  conflict  probabilities  (ttj)  approaching 
0.50.  Such  conflict  probabilities  indicate  that  a  nation  spends  nearly  equal  amounts  of 
time  in  states  of  conflict  and  non-conflict,  resulting  in  relatively  frequent  conflict 
transitions.  A  nation’s  transience  score  is  based  on  the  sum  of  the  mean  recurrence  steps 
(M0,  M{  )  provided  in  Equation  24.  The  expected  number  of  conflict  transitions  over  a 
given  time  period  (7)  is  given  by  Equation  28.  For  the  purposes  of  this  study,  T  is  set  to 
10  years  to  coincide  with  the  10  year  forecast  discussed  in  the  previous  section. 
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Equation  28:  Expected  Number  of  Conflict  Recurrences  for  Given  Time  Period 


T 

E\  Recurrences]  =  - 

5]}=o  Mi 

Where: 

T  =  Time  period  of  interest  (years) 

M .  =  Mean  Recurrence  (number  of  steps) 

j  =  Conflict  State  {0,  1 } 

For  a  hypothetical  nation  exhibiting  a  long-run  conflict  probability  of  0.50,  the 
sum  of  the  mean  recurrence  times  for  both  conflict  and  non-conflict  is  4  years,  resulting 
in  2.5  expected  recurrences  over  a  10  year  period.  This  hypothetical  nation  is  used  as  the 
transience  benchmark,  against  which  all  nations  are  compared,  yielding  the  Transience 
Score  provided  in  Equation  29. 

Equation  29:  Nation  Specific  Transience  Score 

E  [Recurrences] 

Transience  Score  —  - — — - 

2.5 

A  nation’s  transience  score  is  utilized  to  identify  nations  that  are  identified  as 
predisposed  to  conflict  transitions.  Transience  Scores  approaching  one  indicate  highly 
transient  nations,  while  scores  approaching  zero  identify  nations  that  tend  to  remain  in 
one  state  over  the  other.  The  Transience  Scores  for  each  nation  listed  in  the  regional 
tables  provided  in  Appendix  E. 

Table  42  provides  the  top  25  most  transient  nations  identified  in  this  study,  12  of 
which  were  identified  as  being  in  conflict  in  2014.  Libya  and  Tunisia,  which  have 
experienced  relatively  few  conflict  transitions  over  the  past  20  years,  were  identified  as 
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the  most  transient  nations  within  the  study  with  respective  scores  of  approximately  one. 
This  finding  is  attributed  to  the  dynamic  changes  resulting  from  the  Arab  Spring  that  first 
began  in  Tunisia  and  quickly  spread  to  Libya  and  other  Arab  nations.  Similarly, 
Morocco,  Jordan,  and  Egypt  are  also  identified  as  being  highly  transient  based  on  2014 
data.  The  United  States  is  also  identified  as  being  highly  transient,  which  concurs  with 
its  recent  history  of  experiencing  five  conflict  transitions  between  2004  and  2014.  The 
transience  of  the  United  States  is  credited  in  part  to  it  ongoing  worldwide  military 
engagements,  instability  due  to  a  highly  polarized  political  process,  and  ongoing  conflicts 
along  its  southern  border  with  Mexico  resulting  from  population  migration  and  an 
increasingly  violent  illicit  narcotics  trade.  Seven  nations  from  the  Sub-Saharan  Africa 
Region  are  identified  as  being  highly  transient.  Similar  to  the  United  States,  Cameroon 
and  Cote  d’Ivoire  also  have  a  history  of  multiple  conflict  transitions,  experiencing  four 
and  two  recurrences  respectively  between  2004  and  2014. 
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Table  42:  Top  25  Most  Transient  Nations 


Nation 

2014  Conflict 
Status 

Region 

M„ 

M, 

Expected  Number 
Recurrences  per  10 
Years 

Transience 

Score 

Rank 

Libya 

Conflict 

Arab  Countries 

2.00 

2.00 

2.5 

1.00000 

1 

Tunisia 

Conflict 

Arab  Countries 

2.01 

1.99 

2.5 

0.99998 

2 

United  States 

Conflict 

OECD 

1.94 

2.07 

2.5 

0.99898 

3 

Bosnia  and  Herzegovina 

No  Conflict 

Eastern  Europe  and  Central  Asia 

1.93 

2.07 

2.5 

0.99884 

4 

Cameroon 

No  Conflict 

Sub  Saharan  Africa 

2.10 

1.91 

2.5 

0.99774 

5 

Colombia 

Conflict 

Latin  America 

1.89 

2.12 

2.5 

0.99685 

6 

Kiribati 

No  Conflict 

South  and  East  Asia 

1.89 

2.12 

2.5 

0.99677 

7 

Botswana 

No  Conflict 

Sub  Saharan  Africa 

1.89 

2.13 

2.5 

0.99637 

8 

Malawi 

No  Conflict 

Sub  Saharan  Africa 

1.84 

2.19 

2.5 

0.99228 

9 

Montenegro 

No  Conflict 

Eastern  Europe  and  Central  Asia 

1.84 

2.20 

2.5 

0.99199 

10 

Benin 

No  Conflict 

Sub  Saharan  Africa 

2.20 

1.83 

2.5 

0.99191 

11 

Korea,  South 

No  Conflict 

OECD 

2.21 

1.83 

2.5 

0.99112 

12 

Morocco 

Conflict 

Arab  Countries 

2.24 

1.81 

2.5 

0.98884 

13 

Jordan 

Conflict 

Arab  Countries 

2.54 

1.65 

2.4 

0.95477 

14 

Georgia 

No  Conflict 

Eastern  Europe  and  Central  Asia 

2.58 

1.63 

2.4 

0.94998 

15 

Vietnam 

Conflict 

South  and  East  Asia 

1.62 

2.63 

2.4 

0.94325 

16 

Bahamas 

No  Conflict 

Latin  America 

2.63 

1.61 

2.4 

0.94241 

17 

Laos 

No  Conflict 

South  and  East  Asia 

2.71 

1.58 

2.3 

0.93130 

18 

Cote  d'Ivoire 

Conflict 

Sub  Saharan  Africa 

2.76 

1.57 

2.3 

0.92453 

19 

Lesotho 

Conflict 

Sub  Saharan  Africa 

1.56 

2.80 

2.3 

0.91823 

20 

Ecuador 

Conflict 

Latin  America 

2.80 

1.55 

2.3 

0.91809 

21 

Egypt 

Conflict 

Arab  Countries 

2.84 

1.54 

2.3 

0.91189 

22 

Mozambique 

Conflict 

Sub  Saharan  Africa 

2.87 

1.54 

2.3 

0.90836 

23 

Uzbekistan 

No  Conflict 

Eastern  Europe  and  Central  Asia 

2.95 

1.51 

2.2 

0.89568 

24 

Dominican  Republic 

No  Conflict 

Latin  America 

1.50 

3.00 

2.2 

0.88888 

25 

On  the  other  end  of  the  spectrum  are  the  nations  with  exceedingly  low  Transience 
Scores  that  are  projected  to  remain  in  their  current  conflict  states  for  the  foreseeable 
future.  Table  43  provides  the  listing  of  the  top  25  least  transient  nations,  with  the 
Caribbean  nations  of  Cuba  and  Antigua  identified  least  transient  nations  within  the  Study. 
Latin  American  and  OECD  nations  account  for  12  of  the  25  nations  and  show  a 
propensity  to  remain  in  a  state  of  non-conflict.  Of  this  group,  only  Suriname  and 
Trinidad,  both  classified  as  not  in  conflict  in  2014,  are  identified  as  having  a 
predisposition  for  long  term  conflict.  Five  Arab  nations  are  also  identified  within  this 
group,  all  of  which  showing  proclivity  towards  remaining  in  a  state  of  conflict, 
supporting  the  results  of  the  conflict  forecast  provided  in  the  previous  section. 
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Table  43:  Top  25  Least  Transient  Nations 


Nation 

2014  Conflict 
Status 

Region 

Mo 

M, 

Expected  Number 
Recurrences  per  10 
Years 

Transience 

Score 

Rank 

Cuba 

No  Conflict 

Latin  America 

1.00 

1.00E+36 

0.0 

0.00000 

1 

Antigua  and  Barbuda 

No  Conflict 

Latin  America 

1.00 

1.00E+36 

0.0 

0.00000 

2 

Kuwait 

Conflict 

Arab  Countries 

1.56E+13 

1.00 

0.0 

0.00000 

3 

Malta 

No  Conflict 

Eastern  Europe  and  Central  Asia 

1.32E+13 

1.00 

0.0 

0.00000 

4 

Central  African  Republic 

Conflict 

Sub  Saharan  Africa 

4.42E+1 1 

1.00 

0.0 

0.00000 

5 

Qatar 

No  Conflict 

Arab  Countries 

1.43E+11 

1.00 

0.0 

0.00000 

6 

United  Arab  Emirates 

No  Conflict 

Arab  Countries 

1.42E+11 

1.00 

0.0 

0.00000 

7 

Bahrain 

Conflict 

Arab  Countries 

1.42E+11 

1.00 

0.0 

0.00000 

8 

South  Sudan 

Conflict 

Sub  Saharan  Africa 

2.65E+10 

1.00 

0.0 

0.00000 

9 

Pakistan 

Conflict 

Eastern  Europe  and  Central  Asia 

3.93E+09 

1.00 

0.0 

0.00000 

10 

Iceland 

No  Conflict 

OECD 

1.00 

2.21E+09 

0.0 

0.00000 

11 

Norway 

No  Conflict 

OECD 

1.00 

6.22E+08 

0.0 

0.00000 

12 

Ireland 

No  Conflict 

OECD 

1.00 

1.43E+08 

0.0 

0.00000 

13 

Suriname 

No  Conflict 

Latin  America 

7.21E+07 

1.00 

0.0 

0.00000 

14 

Trinidad  and  Tobago 

No  Conflict 

Latin  America 

4.29E+07 

1.00 

0.0 

0.00000 

15 

Denmark 

No  Conflict 

OECD 

1.00 

9.38E+06 

0.0 

0.00000 

16 

Sweden 

No  Conflict 

OECD 

1.00 

8.85E+06 

0.0 

0.00000 

17 

Finland 

No  Conflict 

OECD 

1.00 

5.08E+06 

0.0 

0.00000 

18 

Iraq 

Conflict 

Arab  Countries 

2.06E+06 

1.00 

0.0 

0.00000 

19 

Nepal 

No  Conflict 

South  and  East  Asia 

1.26E+06 

1.00 

0.0 

0.00000 

20 

Indonesia 

Conflict 

South  and  East  Asia 

8.14E+05 

1.00 

0.0 

0.00000 

21 

Belgium 

No  Conflict 

OECD 

1.00 

7.16E+05 

0.0 

0.00001 

22 

Micronesia,  Federated  States  of 

No  Conflict 

South  and  East  Asia 

4.88E+05 

1.00 

0.0 

0.00001 

23 

Netherlands 

No  Conflict 

OECD 

1.00 

3.82E+05 

0.0 

0.00001 

24 

Tajikistan 

Conflict 

Eastern  Europe  and  Central  Asia 

2.70E+05 

1.00 

0.0 

0.00001 

25 

Analysis  of  Table  41  identified  seven  nations:  Malta,  Qatar,  United  Arab 
Emirates,  Suriname,  Trinidad  and  Tobago,  Nepal,  and  Micronesia,  classified  as  not  in 
conflict  in  2014  that  show  an  inclination  towards  long-tenn  uninterrupted  conflict.  A 
subsequent  comparison  with  each  of  these  nations’  respective  ten-year  forecast  shows 
that  five  nations  are  predicted  to  be  in  conflict  in  2024,  and  only  Malta  and  Suriname  are 
forecasted  to  remain  in  the  current  state  over  the  same  time  period.  Subsequently, 
Micronesia,  Nepal,  Qatar,  Trinidad  and  Tobago,  and  the  United  Arab  Emirates  are 
identified  as  being  at  risk  for  near  term  transitions  into  conflict. 

The  distribution  of  Transience  Scores  is  presented  in  Figure  37.  As  can  be  seen, 
113  (62%)  of  the  182  nations  have  Transience  Score  less  than  or  equal  to  0.200. 
Additionally  only  37  (20%)  nations  have  moderate  transience  scores  between  0.20  and 
0.80,  while  32  (18%)  nations  are  classified  as  being  highly  transient  with  scores  greater 
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than  0.80.  This  result  demonstrates  the  typical  finding  that  nations  tend  to  remain  in 
either  a  state  of  conflict  or  non-conflict,  and  that  in  general  national-level  conflict 
transitions  are  rare  events. 
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Figure  37:  Transience  Score  Histogram 
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V.  Conclusions  and  Recommendations 


“Be  prepared  to  re-examine  your  reasoning" 

Robert  S.  McNamara,  In  Respect:  The  Tragedy  and  Lessons  of  Vietnam 

5.1  Chapter  Overview 

This  chapter  provides  the  research  conclusions  derived  from  developing  a  set  of 
region  specific  conditional  logistic  regression  and  Markov  models  for  the  prediction  and 
forecasting  of  conflict  transition  in  nations.  In  Section  5.2  we  provide  a  summary  of  the 
study’s  problem  statement,  research  questions,  and  methodology.  Next,  Section  5.3 
discusses  the  significance  of  the  research  and  its  applicability  in  operational  and  strategic 
level  planning.  Finally,  in  Section  5.3  we  discuss  possible  future  research  concerning  the 
prediction  and  spread  of  violent  conflict  in  nations. 

5.2  Conclusions  of  Research 

This  study  considered  30  statistical  and  trend  variables  in  the  development  of 
models  to  predict  future  incidences  of  conflict  transitions.  Relying  on  logistic  regression, 
Markov  models,  and  methodologies  proven  in  previous  studies,  this  research  reconfirmed 
the  validity  of  using  geographic  sub-regions  to  develop  conditional  logistic  regression 
models  for  the  182  nations  considered  in  this  study.  These  models  subsequently 
developed  the  conflict  transition  probabilities  utilized  in  the  set  of  Markov  models 
enabling  long  range  forecasts  of  regional  and  global  incidences  of  conflict  seldom  seen  in 
previous  analytical  efforts.  Ultimately  the  models  developed  for  this  study  and 
subsequent  analysis  answered  the  five  research  questions  posed  in  Chapter  1 . 
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Question  1:  How  accurately  can  statistical  models  predict  conflict  transitions  for 
individual  nations? 

A  total  of  12  conditional  logistic  regression  models  were  developed  for  six 
geographic  regions  for  this  study.  These  models  achieved  weighted  predictive 
accuracies,  at  the  world  level,  of  88.76%  on  the  training  data  set,  and  84.67%  on  the 
validation  data  set.  Regional  weighted  predictive  accuracies  exceeded  90%  in  the  Arab 
and  North  African  States  model  (93.72%)  and  the  OECD  model  (93.84%)  on  the  training 
set  data  as  well  as  the  validation  OECD  model  which  achieved  a  predictive  accuracy  of 
93.46%,  far  exceeding  the  pre-established  bench  mark  of  80%  predictive  accuracy.  In 
addition  to  their  overall  classification  accuracy,  the  logistic  regression  models  correctly 
classify  43.6%  of  transitions  out  of  conflict,  and  52.2%  of  transitions  into  conflict;  a 
metric  concerning  rare  events,  that  has  never  been  examined  in  previous  conflict 
prediction  studies. 

The  overall  model  accuracies  of  this  study  significantly  exceed  those  generated  by 
the  Goldstone  (Goldstone,  et  ah,  2005),  Hegre  (Hegre  et  ah,  2011),  and  CAA  studies 
(Reed,  2013).  Additionally,  the  regional  models  compare  favorably  with  the  recent 
Boekestein  (Ahner,  Boekestein,  &  Deckro,  2015)  study,  ultimately  generating  higher 
validation  data  set  accuracies  for  all  six  regions.  With  this  result  in  mind,  it  is 
recommended  that  the  Eastern  Europe  and  Central  Asian  region  be  revaluated  with  a 
possible  reassignment  of  some  or  all  of  the  central  Asian  nations  to  the  Arab  &  North 
African  region,  which  shares  similar  ethnic,  political,  and  geographic  features.  A  key 
insight  gained  from  the  validation  of  the  logistic  regression  models  is  the  finding  that 
conflict  transitions  are  generally  easier  to  predict  in  nations  not  in  conflict,  compared  to 
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those  currently  in  conflict.  This  finding  may  be  due  impart  to  the  quality  and  accuracy  of 
the  statistical  data  collected  in  these  nations,  resulting  from  the  presence  of  a  more 
pennissive  and  less  violent  environment. 

Question  2:  What  factors  are  the  significant  predictors  of  conflict  transitions? 

Thirty  independent  variables  were  required  to  construct  the  12  conditional  logistic 
regression  models.  Amongst  the  six  geographic  regions,  statistics  such  as  ethnic 
diversity,  burgeoning  youth  populations,  national  military  expenditures,  religious 
diversity,  and  the  type  of  government  emerged  as  the  most  common  and  significant 
factors  pertaining  to  conflict  transitions.  While  the  conditional  models  within  a  specific 
region  were  always  considerably  different  from  each  other,  in  many  cases  they  share 
common  variables.  However,  in  certain  instances,  such  as  the  case  with  religious 
diversity  in  Latin  American  nations,  the  current  status  of  a  nation  affects  how  the  variable 
will  increase  or  decrease  to  probability  of  a  conflict  transition.  As  is  the  case  in  the  Latin 
American  models  increases  to  a  nation’s  religious  diversity  score,  will  result  in  an 
increased  likelihood  that  the  nation  will  transition  out  of  conflict  in  the  following  year. 
However,  increasing  the  same  religious  diversity  score  in  nations  currently  not  in  conflict 
corresponds  to  a  subsequent  increase  in  the  likelihood  that  these  nations  will  transition 
into  conflict  in  the  next  year.  Due  to  this  phenomenon,  care  must  be  taken  when 
analyzing  how  a  particular  nation  and  region  will  react  to  the  application  of  national 
power  or  other  external  forces  over  an  extended  period  of  time. 
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Question  3:  How  is  the  number  of  global  conflicts  predicted  to  change  by  2024  and 
beyond? 

Two-state  Markov  models  were  developed  using  the  probabilities  generated  by 
the  12  conditional  logistic  regression  models,  for  each  of  the  182  nations  considered  in 
this  study,  enabling  the  forecasting  and  trend  analysis  of  regional  and  global  conflict. 
Over  the  next  decade  the  global  incidence  of  conflict  is  predicted  to  remain  at  2014  levels 
with  84  (46%)  nations  experiencing  some  level  of  violent  conflict.  However,  given  2014 
conflict  data,  the  global  incidence  of  conflict  is  expected  to  increase  to  95  (52%)  nations 
in  long  run.  Regionally,  conflict  levels  are  predicted  to  increase  by  2024  in  Arab  and 
North  African  states,  Eastern  Europe  and  Central  Asia,  and  in  South  and  East  Asia;  these 
trends  are  predicted  to  continue  past  the  forecast  horizon.  Conflict  levels  within  Central 
and  South  America  are  projected  to  remain  constant  over  the  next  decade  with  48%  of  the 
regions  nations  experiencing  violent  conflict.  However,  conflict  rates  within  this  region 
are  predicted  to  increase  to  approximately  70%,  with  19  of  the  27  nations  predicted  to  be 
in  a  state  of  violent  conflict  in  the  long  run.  Conversely,  conflict  levels  are  predicted  to 
decrease  in  OECD  and  Sub-Saharan  African  nations  over  the  next  decade  and  in  the  long 
run,  with  regional  conflict  incidence  rates  of  18%  and  37%  respectively. 

Question  4:  What  nations  are  susceptible  to  conflict  transitions;  which  nations 
appear  invulnerable  to  conflict  transitions? 

Identification  of  nations  susceptible  or  invulnerable  to  conflict  transitions  is 
predicated  on  a  nation’s  Transience  Score  on  a  continuous  scale  from  0  to  1.  Nations 
with  transience  scores  approaching  1  are  said  to  be  susceptible  to  frequent  conflict 
transitions,  while  nations  with  low  scores  tend  to  experience  infrequent  conflict 
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transitions.  Of  the  182  nations  considered  in  this  study,  32  nations  were  identified  as 
being  susceptible  to  frequent  conflict  transitions.  Out  of  this  sub-group,  Libya,  Tunisia, 
and  the  United  States  are  projected  to  experience  repeated  conflict  transitions  over  the 
next  decade.  This  study  also  identified  113  nations  as  being  relatively  invulnerable  to 
conflict  transitions,  with  these  nations  remaining  either  in  a  state  of  conflict  or  non¬ 
conflict  over  the  long  run.  The  nations  of  Cuba,  Antigua  and  Barbuda,  as  well  as  Kuwait 
respectively  had  the  three  lowest  transience  scores  with  the  study,  and  thus  are  not 
projected  to  experience  a  conflict  transition  from  their  current  state  in  the  foreseeable 
future.  As  a  region,  Sub-Saharan  Africa  followed  by  Latin  America  are  the  most 
susceptible  to  conflict  transitions,  while  the  OECD  and  South  and  East  Asian  regions  are 
projected  to  be  the  least  susceptible  to  such  events. 

Question  5:  Which  nations,  currently  not  in  conflict,  are  identified  as  near-term 
risks  for  transitions  into  violent  conflict? 

Analysis  of  long-run  conflict  probabilities  and  transience  scores  sought  to  identify 
nations  as  not  being  in  a  state  of  conflict  in  2014  that  show  a  tendency  towards  existing  in 
a  state  of  violent  conflict.  This  analysis  identified  the  nations  of  Micronesia,  Nepal, 
Qatar,  Trinidad  and  Tobago,  and  the  United  Arab  Emirates  as  at  risk  for  near-term 
transitions  into  violent  conflict.  This  assessment  is  based  on  the  geographic  location  of 
these  nations,  current  regional  political  climates,  and  their  proclivity  towards  long  term 
internal  conflicts. 


149 


5.3  Significance  of  Research 

This  study  accurately  predicts  the  conflict  status  of  182  nations  and  provides 
senior  leadership  with  insight  into  future  conflict  trends  for  both  nations  and  regions, 
allowing  for  both  near-term  planning  and  long-range  strategy  development.  The  research 
provided  herein  enables  the  identification  of  nations  susceptible  to  conflict  transitions, 
along  with  relevant  factors  that  may  possibly  aid  or  prevent  such  a  transition  from 
occurring.  Such  revelations  are  vital  in  the  development  and  implementation  of 
operational  plans  and  national  strategies,  as  they  enable  the  identification  of  possible 
indicators  of  impending  conflict  transitions,  and  enable  the  informed  allocation  of 
resources  to  support  our  operational  and  strategic  end  states.  At  the  same  time,  it  should 
be  evident  that  national  strategies  cannot  simply  apply  “one  size  fits  all”  policies  to 
geographic  regions  or  assume  that  they  will  achieve  the  desired  effects  within  each 
nation.  Care  must  be  taken  to  truly  ascertain  the  conflict  status  of  nations  of  interest  and 
precisely  apply  the  elements  of  national  power  in  order  to  achieve  strategic  end  states. 
Additionally,  administrations  must  balance  the  risks  and  benefits  as  well  as  the  second 
and  third  order  effects  of  such  international  policies  which,  given  the  regional  and  global 
interconnectedness  of  the  21st  century,  will  undoubtedly  have  far  reaching  and  possibly 
global  ramifications. 

5.4  Recommendations  for  Future  Research 

As  part  of  continuing  research,  this  study  recommends  six  areas  that  may  yield 
significant  analytical  insights  into  nation-state  conflict. 
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Relaxed  Forecasting  Assumptions 

To  enable  the  forecasting  of  violent  conflict,  this  study  assumes  that  any  variable 
identified  as  significant  within  the  model  will  remain  relevant  from  year-to-year,  and  for 
the  duration  of  the  conflict  forecasting  period.  Essentially,  this  study  assumes  that 
conditions  present  in  2014  will  remain  unchanged  for  the  foreseeable  future.  It  is 
recommended  that  future  analyses  look  at  relaxing  this  assumption.  Similar  to  the  Hegre 
conflict  model  (Hegre  et  al.,  2011),  these  studies  would  use  existing  or  develop  internal 
projections  of  relevant  independent  conflict  variables  to  develop  forecasting  and 
prediction  models  of  regional  and  global  conflict. 

Analysis  of  Alternate  Geographic  Regions 

As  noted  above,  it  is  recommended  that  the  geographic  regions  used  in  this  study 
be  reanalyzed  and  adjusted  to  improve  regional  commonality  among  the  nations.  In  this 
regard,  a  possible  alternative  is  to  model  the  regions  as  the  six  geographic  Unified 
Combatant  Commands  (UCC).  The  databases  constructed  for  this  study  are  currently  set 
up  to  develop  logistic  regression  models  based  off  either  the  geographic  regions  used  in 
this  study  or  the  current  areas  of  responsibility  of  the  combatant  commands.  Such  a  study 
may  yield  insights  regarding  the  predicted  incidences  of  conflict  within  each  UCC,  and 
potential  realignment  of  their  respective  areas  of  responsibility.  Such  an  analysis  has 
immediate  operational  and  strategic  relevance  following  Chainnan  of  the  Joint  Chiefs  of 
Staff,  General  Joseph  Dunford’s  directive  to  revamp  combat  commands  for  the  “fight  of 
the  future”  (Scarborough,  2015). 
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Analysis  of  Significant  Conflict  Predictor  Variables 

As  part  of  continuing  research  it  is  recommended  that  an  in  depth  analysis  be 
conducted  into  correlation  between  significant  covariates  and  transitions  into  conflict. 
This  analysis  would  seek  to  ascertain  if  a  causal  relationship  does  in  fact  exist  between 
the  independent  covariates  and  the  dependent  variable.  Such  an  analysis  would  also  seek 
to  ascertain  how  manipulating  such  variables  at  both  the  national  and  regional  levels 
affect  transitions  into  conflict. 

Development  and  Implementation  of  the  Border  Conflict  Score  Variable 

The  variable  Border  Conflict  Score  was  identified  as  significant  in  many  of  the 
preliminary  logistic  regression  models  and,  despite  its  absence  in  the  final  models,  it  is 
believed  that  this  variable  may  be  a  significant  predictor  of  the  spread  of  violent  conflict. 
With  that  being  said,  the  current  methodology  used  to  develop  this  variable  fails  to 
properly  account  for  island  nations  or  nations  having  large  coastlines  and  few  land 
borders.  As  a  result,  this  variable  does  not  effectively  model  nations  such  as  Australia  or 
the  Philippines,  or  other  that  do  not  share  a  land  border  with  any  other  nation,  resulting  in 
a  Border  Conflict  Score  of  zero.  An  island  and  coastal  nation  analog  to  this  variable 
(e.g.,  shared  fisheries,  number  of  international  deep  water  ports,  number  of  disputed 
claims  to  islands,  or  some  other  metric  that  may  be  used  as  a  vector  to  model  the  spread 
of  conflict)  must  be  developed  and  implemented  for  use  in  future  studies. 

Dynamic  Border  Conflict  Variables  in  Forecasting  Models 

Following  the  use  of  the  Border  Conflict  Score  variable  within  logistic  regression 
models,  subsequent  methodologies  may  wish  explore  forecasting  future  incidences  of 
nation-state  conflict  using  a  dynamic  border  conflict  score  within  Markov  models.  Such 
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a  methodology  would  allow  the  nation  specific  Border  Conflict  Score  variable  to  change 
by  recalculating  the  conditional  probabilities  at  each  transition,  followed  by  Monte  Carlo 
simulations  that  obtain  average  outcomes.  Such  an  analysis  would  be  able  to  derive 
insights  into  how  conflict  begins,  terminates,  and/or  spreads  based  upon  interactions  at 
international  boundaries. 

Conflict  Spread  through  Interconnected  Regional  Networks 

The  final  recommendation  for  future  research  explores  modeling  the  spread  of 
violent  conflict  through  interconnected  regional  networks.  Such  an  analysis  may  seek  to 
employ  a  methodology  similar  to  the  Spears  study  that  explored  viral  epidemiology  in 
connected  networks  (Spears,  2001).  Such  an  analysis  would  require  contributions  from 
multiple  disciplines  including  logistic  regression,  Markov  and  stochastic  modeling, 
dynamic  programming,  and  network  analysis.  Due  to  the  complexity  of  this  problem,  it 
is  recommended  that  such  a  study  focus  on  a  specific  region,  such  as  South  West  Asia  or 
Sub-Saharan  Africa,  as  opposed  to  a  global  model.  This  analysis  would  explore  causes 
and  develop  insights  into  the  spread  of  violent  conflict  across  international  borders  due  to 
such  factors  as  trade,  population  migrations,  or  climatic  and  economic  conditions. 
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Appendix  A:  Regional  Assignments  of  Nations 


Table  44:  Regional  Assignments  of  Nations 


Number  per 
Region 

Sub-Saharan 

Africa 

South  and  East 

Asia 

Eastern  Europe 
and  Central 

Asia 

Arab  &  North 

African  States 

Latin  America 

OECD 

1 

Angola 

Bangladesh 

Afghanistan 

Algeria 

Antigua  and 
Barbuda 

Australia 

2 

Benin 

Bhutan 

Albania 

Bahrain 

Argentina 

Austria 

3 

Botswana 

Brunei 

Darussalam 

Armenia 

Egypt 

Bahamas 

Belgium 

4 

Burkina  Faso 

Cambodia 

Azerbaijan 

Iraq 

Barbados 

Canada 

5 

Burundi 

China 

Belarus 

Jordan 

Belize 

Chile 

6 

Cabo  Verde 

Korea,  North 

Bosnia  and 

Kuwait 

Bolivia 

Czech  Republic 

7 

Cameroon 

Fiji 

Bulgaria 

Lebanon 

Brazil 

Denmark 

8 

Central  African 

India 

Croatia 

Libya 

Colombia 

Estonia 

9 

Chad 

Indonesia 

Cyprus 

Morocco 

Costa  Rica 

Finland 

10 

Comoros 

Kiribati 

Georgia 

Oman 

Cuba 

France 

11 

Congo,  Republic 
of  the 

Laos 

Iran 

Qatar 

Dominican 

Republic 

Germany 

12 

Cote  d'l voire 

Malaysia 

Kazakhstan 

Saudi  Arabia 

Ecuador 

Greece 

Congo, 

13 

Democratic 

Maldives 

Kyrgyzstan 

Syria 

El  Salvador 

Hungary 

Republic  of  the 

Micronesia, 

14 

Djibouti 

Federated  States 

Latvia 

Tunisia 

Grenada 

Iceland 

of 

15 

Equatorial  Guinea 

Mongolia 

Lithuania 

United  Arab 

Emirates 

Guatemala 

Ireland 

16 

Eritrea 

Myanmar 

Malta 

West  Bank 

Guyana 

Israel 

17 

Ethiopia 

Nepal 

Montenegro 

Yemen 

Haiti 

Italy 

18 

Gabon 

Papua  New 
Guinea 

Pakistan 

Honduras 

Japan 

19 

Gambia 

Philippines 

Moldova 

Jamaica 

Luxembourg 

20 

Ghana 

Samoa 

Romania 

Nicaragua 

Mexico 

21 

Guinea 

Singapore 

Russia 

Panama 

Netherlands 

22 

Guinea-Bissau 

Solomon  Islands 

Serbia 

Paraguay 

New  Zealand 

23 

Kenya 

Sri  Lanka 

Slovakia 

Peru 

Norway 

24 

Lesotho 

Thailand 

Tajikistan 

Suriname 

Poland 

25 

Liberia 

Timor-Leste 

Macedonia 

Trinidad  and 
Tobago 

Portugal 

26 

Madagascar 

Tonga 

Turkmenistan 

Uruguay 

Korea,  South 

27 

Malawi 

Vanuatu 

Ukraine 

Venezuela 

Slovenia 

28 

Mali 

Vietnam 

Uzbekistan 

Spain 

29 

Mauritania 

Sweden 

30 

Mauritius 

Switzerland 

31 

Mozambique 

Turkey 

32 

Namibia 

United  Kingdom 

33 

Niger 

United  States 

34 

Nigeria 

35 

Rwanda 

36 

Sao  Tome  and 

Principe 

37 

Senegal 

38 

Seychelles 

39 

Sierra  Leone 

40 

Somalia 

41 

South  Africa 

42 

South  Sudan 

43 

Sudan 

44 

Swaziland 

45 

Togo 

46 

Uganda 

47 

Tanzania 

48 

Zambia 

49 

Zimbabwe 

154 


Appendix  B:  Region  Specific  Conditional  Logistic  Regression  Models 
Table  45:  Summary  of  Arab  and  North  African  State  Models 


Arab  State  Models 


Given  "Non-Conflict" 

Term 

Estimate  Prob  >  Chi-Sq 

Given  "Conflict" 

Term 

Estimate  Prob  >  Chi-Sq 

Intercept 

43.828 

0.0265 

Intercept 

728.886 

0.009 

3  Yr  Freedom  Trend 

22.435 

0.0781 

Death  Rate 

-16.257 

0.0104 

Ethnic  Diversity 

-45.350 

0.0271 

Life  Expectancy 

-7.545 

0.0096 

Regime  Type  (Emerging) 

-3.107 

0.2338 

Youth  Bulge 

-2.279 

0.0092 

Regime  Type  (Democratic) 

4.444 

0.0161 

Ethnic  Diversity 

168.499 

0.0091 

Religious  Diversity 

-211.666 

0.0083 

Military  Expend  (%  GDP) 

1.317 

0.0209 

Model  Significance  (Prob  >  Chi-Sq): 

0.006 

Model  Significance  (Prob  >  Chi-Sq): 

0.000 

Area  Under  the  Curve  (Training): 

0.962 

Area  Under  the  Curve  (Training): 

0.930 

Table  46:  Summary  of  Eastern  Europe  &  Central  Asian  Models 

Eastern  Europe  &  Central  Asia  Models 


Given  "Non-Conflict" 

Term 

Estimate  Prob  >  Chi-Sq 

Given  "Conflict" 

Term 

Estimate  Prob  >  Chi-Sq 

Fertility  Rate 

8.958 

0.0051 

Intercept 

-28.959 

0.0011 

Infant  Mortality  rate 

-0.337 

0.0086 

Arable  Land 

2.801 

0.0037 

Population  density 

0.078 

0.0159 

GDP  Per  Capitia 

0.000 

0.0047 

Trade  <%  GDP) 

-0.381 

0.007 

Improved  Water 

0.208 

0.0024 

Freedom  Score 

11.598 

0.0311 

2  Yr  Freedom  Trend 

-44.417 

0.0177 

3  Yr  Freedom  Trend 

58.898 

0.0012 

Religious  Diversity 

10.689 

0.0052 

2  Yr  Conflict  Intensity  Trend 

-10.734 

0.0018 

Government  Type  (Emerging) 

4.796 

0.0028 

Government  Type  (Democratic) 

-0.342 

0.6275 

Government  Type  (Foreign  Interruption) 

6.652 

0.0004 

Model  Significance  (Prob  >  Chi-Sq): 

0.000 

Area  Under  the  Curve  (Training): 

0.972 

Model  Significance  (Prob  >  Chi-Sq): 

0.000 

Area  Under  the  Curve  (Training): 

0.946 

155 


Table  47:  Summary  of  Latina  American  Models 


Latin  American  Asia  Models 


Given  "Conflict" 

Term 

Estimate  Prob  >  Chi-Sq 

Given  "Non-Conflict" 

Term 

Estimate  Prob  >  Chi-Sq 

Religious  Diversity 

-33.471 

0.0361 

Birth  Rate 

-1.094 

0.0186 

Government  Type  (Emerging) 

36.483 

0.0281 

Fertility  Rate 

9.990 

0.009 

Government  Type  (Democratic) 

37.238 

0.0233 

Improved  Water 

-0.644 

0.0031 

Government  Type  (Anarchy) 

30.524 

0.0287 

Infant  Mortality  rate 

-0.535 

0.0039 

Freedom  Score 

-7.958 

0. 1997 

Ethnic  Diversity 

-21.442 

0.0002 

Religious  Diversity 

11.832 

0.0005 

Regime  Type  (Democratic) 

72.267 

0.004 

(Religious  Diversity-0.62476)2 

-18.746 

0.1017 

Model  Significance  (Prob  >  Chi-Sq): 

0.010 

Area  Under  the  Curve  (Training): 

0.878 

Model  Significance  (Prob  >  Chi-Sq): 

0.000 

Area  Under  the  Curve  (Training): 

0.952 

Table  48:  Summary  of  OECD  Models 


OECD  Nation  Models 


Given  "Non-Conflict" 

Term 

Estimate  Prob  >  Chi-Sq 

Given  "Conflict" 

Term 

Estimate  Prob  >  Chi-Sq 

Intercept 

-19.386 

0.0181 

Intercept 

74. 164 

0.0173 

Military  Expend  (%  Gov  Spending) 

-0.355 

0.0255 

Birth  Rate 

1.594 

0.0709 

Youth  Bulge 

0.489 

0.0371 

Death  Rate 

-1.611 

0.0262 

Caloric  Intake 

0.005 

0.027 

Military  Expend  (%  GDP) 

5.874 

0.0145 

Infant  Mortality  rate 

3.015 

0.0157 

Youth  Bulge 

-3.148 

0.0249 

Caloric  Intake 

-0.017 

0.016 

Model  Significance  (Prob  >  Chi-Sq): 

0.001 

Area  Under  the  Curve  (Training): 

0.914 

Model  Significance  (Prob  >  Chi-Sq): 

0.000 

Area  Under  the  Curve  (Training): 

0.974 

Table  49:  Summary  of  South  &  East  Asian  Models 


Given  "Conflict" 

Term 

Estimate  Prob  >  Chi-Sq 

Given  "Non-Conflict" 

Term 

Estimate  Prob  >  Chi-Sq 

Intercept 

25.048 

0.0014 

Intercept 

17.886 

0.0126 

Military  Expend  (%  GDP) 

-0.347 

0.0272 

Arable  Land 

-20.358 

0.0325 

Population  Growth 

-6.212 

0.002 

Military  Expend  (%  GDP) 

-2.259 

0.0175 

Trade  (%  GDP) 

-0.109 

0.0059 

Refugee  (Origin) 

9.60E-06 

0.0437 

Ethnic  Diversity 

-12.087 

0.0066 

Fresh  Water  per  Capita 

-5.59E-05 

0.0162 

Government  Type  (Emerging) 

2.916 

0.0346 

Caloric  Intake 

-0.005 

0.0241 

Government  Type  (Democratic) 

4.402 

0.004 

Model  Significance  (Prob  >  Chi-Sq): 

0.000 

Area  Under  the  Curve  (Training): 

0.938 

Model  Significance  (Prob  >  Chi-Sq): 

0.000 

Area  Under  the  Curve  (Training): 

0.932 
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Table  50:  Summary  of  Sun-Saharan  African  Models 


Sub-Saharan  Africa  Nation  Models 


Given  "Conflict" 

Term 

Estimate 

Prob  >  Chi-Sq 

Birth  Rate 

-0.400 

0.0047 

Life  Expectancy 

-0.190 

0.003 

Military  Expend  (%  Gov  Spending) 

-0.158 

0.0454 

Youth  Bulge 

0.672 

0.0006 

Refugee  (Origin) 

1.17E-05 

0.0044 

Fresh  Water  per  Capita 

-7.40423E-05 

0.0017 

Ethnic  Diversity 

-4.421 

0.0011 

Government  Type  (Emerging) 

2.797 

0.0118 

Government  Type  (Democratic) 

3.779 

0.0034 

Government  Type  (Anarchy) 

26.327 

0.9999 

Government  Type  (Transition) 

-1.530 

0.3111 

Model  Significance  (Prob  >  Chi-Sq): 

0.000 

Area  Under  the  Curve  (Training): 

0.938 

Given  "Non-Conflict" 

Term 

Estimate 

Prob  >  Chi-Sq 

Arable  Land 

7.801 

0.0002 

Birth  Rate 

-0.474 

0.0031 

Infant  Mortality  rate 

0.053 

0.0041 

Youth  Bulge 

0.346 

0.011 

Refugee  (Asylum) 

5.91E-06 

0.0153 

Trade  (%  GDP) 

-0.052 

0.0051 

Freedom  Score 

-5.637 

0.0004 

Model  Significance  (Prob  >  Chi-Sq): 

0.000 

Area  Under  the  Curve  (Training): 

0.932 
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Appendix  C:  Markov  Model  Outputs 


Table  51:  Markov  Model  results  by  Nation 
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Table  51  Continued 
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Table  51  Continued 
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Appendix  D:  Regional  Conflict  Forecasts  for  2016,  2019,  and  2024 


Table  52:  Arab  &  North  African  States  2,  5,  and  10  Year  Forecasts 
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Table  53:  Eastern  Europe  &  Central  Asia  2,  5,  and  10  Year  Forecasts 
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Table  54:  Latin  America  2,  5,  and  10  Year  Forecasts 
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Table  55:  OECD  2,  5,  and  10  Year  Forecasts 
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Table  56:  South  &  East  Asia  2,  5,  and  10  Year  Forecasts 
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Table  57:  Sub-Saharan  Africa  2,  5,  and  10  Year  Forecasts 
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Appendix  E:  Transience  Scores  by  Region 


Table  58:  Arab  &  North  African  States  Transience  Scores 
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Table  59:  Eastern  Europe  &  Central  Asia  Transience  Scores 
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Table  60:  Latin  American  Transience  Scores 
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Table  61:  OECD  Transience  Scores 
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Table  62:  South  &  East  Asia  Transience  Scores 
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Table  63:  Sub-Saharan  Africa  Transience  Scores 
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